Oreilly - Spark in Motion
by Jason Kolter | Released March 2019 | ISBN: 10000MNLV201706
Quick, no nonsense. What more can you wish? Jonathan Rioux, Senior Analyst Spark in Motion teaches you how to use Spark for batch and streaming data analytics. In nearly 3 hours of hands-on video lessons, you'll get up and running with Spark, starting with the basic architecture of a Spark application. You'll explore data partitioning and accessing common application state, and then you'll deep-dive into using Spark SQL and dataframes for structured analytics. Finally, you'll use Spark Streaming to handle and process real-time data flowing into your application. When you're doing analytics on big data systems, it can be a challenge to efficiently query, stream, filter, and consolidate data sharded across a cluster. Built especially for efficiently operating over large distributed datasets, the Spark data processing engine takes some of the weight off your shoulders. Spark features an easy-to-use interface, near-limitless upgrade potential, and performance that will knock your socks off. Spark simplifies your data infrastructure so you can focus on creating top-notch analytics. Inside: Exploring the Spark Ecosystem Deploying Spark on a cluster Analytics with SparkSQL Real-time applications with Spark Streaming Designed for a software engineer or architect, data scientist, or data analyst interested in getting started with Spark. No prior experience is needed. Jason Kolter is an instructor for the University of Washington certificate program in Big Data Technologies. Additionally he has worked in a wide range of technology companies, gaining extensive experience leading teams building production large-scale distributed analytics systems. Best course I have seen so far. Peter J. Hampton, AI Researcher Spark is a very valuable library, but it's very hard to use (the learning step is very steep). This video course makes the learning smoother, and takes the users to a place where they can experiment by themselves. Alberto Boschetti, Data Scientist Show and hide more
- AN INTRODUCTION TO APACHE SPARK
- What is Spark? 00:04:45
- Exploring the Spark ecosystem 1 00:06:26
- Functional programming using the Spark shell 00:08:48
- Rich programming using notebooks 00:06:24
- Using RDDs part 1: Features and creating loading 00:08:06
- Using RDDs part 2: Transformations and actions 00:08:19
- Spark application architecture 00:06:22
- Summary 00:01:49
- BUILDING REALISTIC SPARK APPLICATIONS
- Deploying Spark on a cluster 00:07:11
- Scaling Spark applications 00:08:58
- Making iterative applications fly 00:06:43
- Accessing common application state 00:04:42
- Configuring the Spark runtime 00:06:05
- Monitoring and metrics with the Spark Web UI 00:04:52
- Summary 00:01:12
- ADVANCED ANALYTICS WITH SPARK SQL AND DATASETS
- Creating and using datasets 00:05:30
- Structured processing using Spark SQL 00:05:27
- Bringing SQL to Spark with the DataFrame API 00:05:26
- Working with Spark SQL data sources 00:04:32
- Interactive queries with the Spark SQL server 00:03:44
- Summary 00:01:01
- LOW LATENCY APPLICATIONS WITH SPARK STREAMING
- What is a streaming application? 00:03:32
- Understanding Spark Streaming 00:04:48
- Programming Spark Streaming 00:05:24
- Spark Streaming data sources 00:05:35
- What is Structured Streaming? 00:07:22
- Building continuous applications using Structured Streaming 00:07:20
- Summary and course wrap-up 00:01:54
- APPENDICES
- Installing Spark 00:03:19
- Installing Jupyter Notebook 00:05:04
Show and hide more