Oreilly - Taming Big Data with Apache Spark and Python - Hands On!
by Frank Kane | Released September 2016 | ISBN: 9781787129931
More than 15 hands-on examples to help you analyze large data sets with Apache SparkAbout This VideoUnderstand how Spark can be distributed across computing clustersDevelop and run Spark jobs efficiently using PythonA hands-on tutorial with over 15 real-world examples teaching you Big Data processing with Spark In DetailApache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis. This course will be your companion to learn Apache Spark in a hands-on manner. Start with understanding how to set up Spark on a single system or on a cluster. From analyzing large data sets using Spark RDD, to developing and running effective Spark jobs quickly using Python, this course will teach you everything. Packed with over 15 interactive, fun-filled examples relevant to the real-world, the course will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease. Show and hide more Publisher resources Download Example Code
- Chapter 1 : Getting Started with Spark
- Introduction 00:02:16
- How to Use This Course 00:01:41
- Getting Set Up – Installing Python, a JDK, Spark, and its Dependencies 00:14:53
- Installing the MovieLens Movie Rating Dataset 00:03:35
- Run Your First Spark Program – Ratings Histogram Example 00:04:53
- Chapter 2 : Spark Basics and Simple Examples
- Introduction to Spark 00:10:12
- The Resilient Distributed Dataset (RDD) Z 00:12:17
- Ratings Histogram Walkthrough 00:13:34
- Key/Value RDDs and the Average Friends by Age Example 00:16:13
- Running the Average Friends by Age Example 00:05:39
- Filtering RDDs and the Minimum Temperature by Location Example 00:08:10
- Running the Minimum Temperature Example and Modifying It for Maximums 00:05:09
- Running the Maximum Temperature by Location Example 00:03:22
- Counting Word Occurrences Using flatmap() 00:07:28
- Improving the Word Count Script with Regular Expressions 00:04:45
- Sorting the Word Count Results 00:07:45
- Find the Total Amount Spent by Customer 00:04:01
- Check Your Results and Sort Them by Total Amount Spent 00:05:08
- Check Your Sorted Implementation and Results Against Mine 00:03:19
- Chapter 3 : Advanced Examples of Spark Programs
- Find the Most Popular Movie 00:05:53
- Use Broadcast Variables to Display Movie Names Instead of ID Numbers 00:08:24
- Find the Most Popular Superhero in a Social Graph 00:04:29
- Run the Script – Discover Who the Most Popular Superhero is! 00:06:00
- Superhero Degrees of Separation – Introducing Breadth-First Search 00:07:54
- Superhero Degrees of Separation – Accumulators and Implementing BFS in Spark 00:06:45
- Superhero Degrees of Separation – Review the Code and Run it 00:09:14
- Item-Based Collaborative Filtering in Spark, cache(), and persist() 00:10:13
- Running the Similar Movies Script Using Spark's Cluster Manager 00:10:55
- Improve the Quality of Similar Movies 00:02:58
- Chapter 4 : Running Spark on a Cluster
- Introducing Elastic MapReduce 00:05:08
- Setting Up Your AWS / Elastic MapReduce Account and PuTTY 00:09:56
- Partitioning 00:04:22
- Create Similar Movies from One Million Ratings – Part 1 00:05:12
- Create Similar Movies from One Million Ratings – Part 2 00:11:28
- Create Similar Movies from One Million Ratings – Part 3 00:03:29
- Troubleshooting Spark on a Cluster 00:03:43
- More Troubleshooting and Managing Dependencies 00:05:48
- Chapter 5 : SparkSQL, DataFrames, and DataSets
- Introducing SparkSQL 00:06:08
- Executing SQL Commands and SQL-Style Functions on a DataFrame 00:08:17
- Using DataFrames Instead of RDDs 00:05:53
- Chapter 6 : Other Spark Technologies and Libraries
- Introducing MLLib 00:08:10
- Using MLLib to Produce Movie Recommendations 00:02:57
- Analyzing the ALS Recommendations Results 00:04:53
- Using DataFrames with MLLib 00:07:32
- Spark Streaming and GraphX 00:07:36
- Chapter 7 : You Made It! Where to Go from Here
- Learning More about Spark and Data Science 00:04:09
Show and hide more
TO MAC USERS: If RAR password doesn't work, use this archive program:
RAR Expander 0.8.5 Beta 4 and extract password protected files without error.
TO WIN USERS: If RAR password doesn't work, use this archive program:
Latest Winrar and extract password protected files without error.