Oreilly - Streaming Big Data with Spark Streaming, Scala, and Spark 3!
by Frank Kane | Released September 2016 | ISBN: 9781787123915
Process large amounts of data in real time using Spark StreamingAbout This VideoProcess streams of real-time data from various sources with Spark StreamingQuery your streaming data in real-time using Spark SQLA comprehensive tutorial with practical examples to help you develop real-time Spark applicationsIn Detail"Big Data" analysis is a hot and highly valuable skill. Thing is, "big data" never stops flowing! Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created - why wait for some nightly analysis to run when you can constantly update your analysis in real time, all the time? Whether it's clickstream data from a big website, sensor data from a massive "Internet of Things" deployment, financial data, or something else - Spark Streaming is a powerful technology for transforming and analyzing that data right when it is created, all the time.This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You'll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we'll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too. Show and hide more Publisher resources Download Example Code
- Chapter 1 : Getting Started
- Introduction and Getting Set Up 00:15:20
- Stream Live Tweets with Spark Streaming! 00:12:29
- Chapter 2 : A Crash Course in Scala
- Scala Basics - Part 1 00:11:27
- Scala Basics - Part 2 00:09:41
- Flow Control in Scala 00:07:18
- Functions in Scala 00:08:47
- Data Structures in Scala 00:16:38
- Chapter 3 : Spark Streaming Concepts
- Introduction to Spark 00:07:06
- The Resilient Distributed Dataset (RDD) 00:10:40
- RDDs in Action - Simple Word Count Application 00:08:17
- Introduction to Spark Streaming 00:06:32
- Revisiting the PrintTweets Application 00:05:10
- Windowing - Aggregating Data over Longer Time Spans 00:05:00
- Fault Tolerance in Spark Streaming 00:06:06
- Chapter 4 : Spark Streaming Examples with Twitter
- Saving Tweets to Disk 00:13:24
- Tracking the Average Tweet Length 00:08:23
- Tracking the Most Popular Hashtags 00:14:51
- Chapter 5 : Spark Streaming Examples with Clickstream / Apache Access Log Data
- Tracking the Top URLs Requested 00:13:27
- Alarming on Log Errors 00:11:56
- Integrating Spark Streaming with Spark SQL 00:10:18
- Intro to Structured Streaming in Spark 2 00:08:27
- Analyzing Apache Log files with Structured Streaming 00:11:24
- Chapter 6 : Integrating with Other Systems
- Integrating with Apache Kafka 00:12:20
- Integrating with Apache Flume 00:08:51
- Integrating with Amazon Kinesis 00:05:30
- Writing Custom Data Receiver 00:06:56
- Integrating with Cassandra 00:07:35
- Chapter 7 : Advanced Spark Streaming Examples
- Stateful Information in Spark Streams 00:15:07
- Streaming K-Means Clustering 00:15:36
- Streaming Linear Regression 00:11:50
- Chapter 8 : Spark Streaming in Production
- Running with spark-submit 00:10:47
- Packaging Your Code with SBT 00:10:49
- Running on a Real Hadoop Cluster with EMR 00:13:14
- Troubleshooting and Tuning Spark Jobs 00:12:35
- Chapter 9 : You Made It!
Show and hide more