->
Oreilly - Taming Big Data with Apache Spark and Python - Hands On! - 9781787129931
Oreilly - Taming Big Data with Apache Spark and Python - Hands On!
by Frank Kane | Released September 2016 | ISBN: 9781787129931


More than 15 hands-on examples to help you analyze large data sets with Apache SparkAbout This VideoUnderstand how Spark can be distributed across computing clustersDevelop and run Spark jobs efficiently using PythonA hands-on tutorial with over 15 real-world examples teaching you Big Data processing with Spark In DetailApache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis. This course will be your companion to learn Apache Spark in a hands-on manner. Start with understanding how to set up Spark on a single system or on a cluster. From analyzing large data sets using Spark RDD, to developing and running effective Spark jobs quickly using Python, this course will teach you everything. Packed with over 15 interactive, fun-filled examples relevant to the real-world, the course will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease. Show and hide more Publisher resources Download Example Code
  1. Chapter 1 : Getting Started with Spark
    • Introduction 00:02:16
    • How to Use This Course 00:01:41
    • Getting Set Up – Installing Python, a JDK, Spark, and its Dependencies 00:14:53
    • Installing the MovieLens Movie Rating Dataset 00:03:35
    • Run Your First Spark Program – Ratings Histogram Example 00:04:53
  2. Chapter 2 : Spark Basics and Simple Examples
    • Introduction to Spark 00:10:12
    • The Resilient Distributed Dataset (RDD) Z 00:12:17
    • Ratings Histogram Walkthrough 00:13:34
    • Key/Value RDDs and the Average Friends by Age Example 00:16:13
    • Running the Average Friends by Age Example 00:05:39
    • Filtering RDDs and the Minimum Temperature by Location Example 00:08:10
    • Running the Minimum Temperature Example and Modifying It for Maximums 00:05:09
    • Running the Maximum Temperature by Location Example 00:03:22
    • Counting Word Occurrences Using flatmap() 00:07:28
    • Improving the Word Count Script with Regular Expressions 00:04:45
    • Sorting the Word Count Results 00:07:45
    • Find the Total Amount Spent by Customer 00:04:01
    • Check Your Results and Sort Them by Total Amount Spent 00:05:08
    • Check Your Sorted Implementation and Results Against Mine 00:03:19
  3. Chapter 3 : Advanced Examples of Spark Programs
    • Find the Most Popular Movie 00:05:53
    • Use Broadcast Variables to Display Movie Names Instead of ID Numbers 00:08:24
    • Find the Most Popular Superhero in a Social Graph 00:04:29
    • Run the Script – Discover Who the Most Popular Superhero is! 00:06:00
    • Superhero Degrees of Separation – Introducing Breadth-First Search 00:07:54
    • Superhero Degrees of Separation – Accumulators and Implementing BFS in Spark 00:06:45
    • Superhero Degrees of Separation – Review the Code and Run it 00:09:14
    • Item-Based Collaborative Filtering in Spark, cache(), and persist() 00:10:13
    • Running the Similar Movies Script Using Spark's Cluster Manager 00:10:55
    • Improve the Quality of Similar Movies 00:02:58
  4. Chapter 4 : Running Spark on a Cluster
    • Introducing Elastic MapReduce 00:05:08
    • Setting Up Your AWS / Elastic MapReduce Account and PuTTY 00:09:56
    • Partitioning 00:04:22
    • Create Similar Movies from One Million Ratings – Part 1 00:05:12
    • Create Similar Movies from One Million Ratings – Part 2 00:11:28
    • Create Similar Movies from One Million Ratings – Part 3 00:03:29
    • Troubleshooting Spark on a Cluster 00:03:43
    • More Troubleshooting and Managing Dependencies 00:05:48
  5. Chapter 5 : SparkSQL, DataFrames, and DataSets
    • Introducing SparkSQL 00:06:08
    • Executing SQL Commands and SQL-Style Functions on a DataFrame 00:08:17
    • Using DataFrames Instead of RDDs 00:05:53
  6. Chapter 6 : Other Spark Technologies and Libraries
    • Introducing MLLib 00:08:10
    • Using MLLib to Produce Movie Recommendations 00:02:57
    • Analyzing the ALS Recommendations Results 00:04:53
    • Using DataFrames with MLLib 00:07:32
    • Spark Streaming and GraphX 00:07:36
  7. Chapter 7 : You Made It! Where to Go from Here
    • Learning More about Spark and Data Science 00:04:09
  8. Show and hide more

    Oreilly - Taming Big Data with Apache Spark and Python - Hands On!


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Coktum   |  

Information
Members of Guests cannot leave comments.




rss