AWS Solutions: Real-time Analytics with Spark Streaming now supports Spark SQL, Dataframes and more

Posted on: May 22, 2020

AWS has updated Real-Time Analytics with Spark Streaming, an AWS Solution that automatically deploys a highly available, cost-effective batch and real-time data analytics architecture on the AWS Cloud that leverages Apache Spark Streaming and Amazon Kinesis. This solution is designed to support custom Apache Spark Streaming applications, and leverages Amazon EMR for processing vast amounts of data across dynamically scalable Amazon Elastic Compute Cloud (Amazon EC2) instances.

The solution now includes an updated consumer application using the latest version of Spark and leverages modern features (such as Spark SQL and DataFrames), granular custom IAM policies, encryption at rest(default), flow logs to VPC, porting sample Spark streaming applications to Java (from Scala), and several maintenance upgrades such as updating Python to version 3.8 and updating Amazon EMR to version 5.29.0. To learn more about Real-Time Analytics with Spark Streaming on AWS, see the solution webpage.

Additional AWS Solutions offerings are available on the AWS Solutions webpage, where customers can browse solutions by product category or industry to find AWS-vetted, automated, turnkey reference implementations that address specific business needs.