Oreilly - Big Data Processing with Apache Spark
by Manuel Ignacio Franco Galeano, Nimish Narang | Released January 2019 | ISBN: 9781789953688
Efficiently tackle large data sets and big data analysis challenges using Spark and PythonAbout This VideoThis course will allow the learner to:Get up and running with Apache Spark and PythonIntegrate Spark with AWS for real-time analyticsApply processed data streams to machine learning APIs of Apache SparkIn DetailProcessing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. Big Data Processing with Apache Spark teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.By the end of this course, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects. Show and hide more
- Chapter 1 : Introduction to Spark Distributed Processing
- Course Overview 00:02:34
- Installation and Setup 00:04:50
- Lesson Overview 00:03:35
- Introduction to Spark and Resilient Distributed Datasets 00:16:14
- Operations Supported by the RDD API 00:15:42
- Map Reduce Operations 00:07:29
- Self-Contained Python Spark Programs 00:10:48
- Nested Functions and Standalone Python Programs 00:10:11
- Introduction to SQL, Datasets, and DataFrames 00:14:16
- Lesson Summary 00:00:44
- Chapter 2 : Introduction to Spark Streaming
- Lesson Overview 00:01:18
- Introduction to Streaming Architectures 00:02:29
- Introduction to Discretized Streams (Dstreams) 00:12:27
- Operations Supported by the Spark Streaming API 00:17:56
- Windowing Operations 00:13:00
- Structured Streaming 00:11:08
- Lesson Summary 00:00:41
- Chapter 3 : Spark Streaming Integration with AWS
- Lesson Overview 00:01:04
- Spark Integration with AWS Services 00:09:56
- Integrating AWS Kinesis and Python 00:14:46
- AWS S3 Basic Functionality 00:08:38
- Kinesis Streams and Spark Streams 00:01:54
- Lesson Summary 00:00:40
- Chapter 4 : Spark Streaming, ML, and Windowing Operations
- Lesson Overview 00:01:12
- Spark Integration with Machine Learning 00:17:45
- Spark Streaming Windowing Operations 00:07:25
- Lesson Summary 00:01:32
Show and hide more