Oreilly - Introduction to Apache Spark
by | Released March 2015 | ISBN: 9781491919729
Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala. Recently updated with nearly an hour of new footage on DataFrames in Spark 1.3, this video workshop shows you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You'll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan, host of the popular Just Enough Math video workshop.With this workshop, you will:Get going with the newest features of Spark 1.3Open a Spark shellDevelop Spark apps for typical use casesUse some machine-learning algorithmsExplore data sets loaded from HDFS or another filesystemWork with Spark SQL, Spark Streaming, and Spark's machine-learning library, MLlibUse Maven, SBT, IPython Notebook, and other toolingLearn about Spark follow-up courses and certificationPaco Nathan has led innovative data teams building large-scale apps for several years. He's an expert in distributed systems, machine learning, cloud computing, and functional programming. Show and hide more Publisher resources Download Example Code
- Pre-Flight Check 00:13:08
- Spark Deconstructed 00:14:31
- A Brief History 00:23:28
- Simple Spark Apps 00:25:07
- Spark Essentials 00:35:18
- Spark Examples 00:21:55
- Unifying the Pieces - Spark SQL 00:24:07
- Unifying the Pieces - Spark Streaming 00:14:48
- Unifying the Pieces - MLlib and GraphX 00:20:00
- Unified Workflows Demo 00:22:35
- The Full SDLC 00:04:01
- Developer Certification 00:06:10
- Resources 00:04:44
- Introduction - Why DataFrames? 00:02:28
- ETL to Prepare the Data from Capital Bikeshare 00:02:46
- Create a DataFrame, Explore using SQL 00:02:47
- Data Preparation for Machine Learning Models 00:05:33
- Build a Classifier Using Naive Bayes 00:04:43
- Build a Classifier Using Decision Trees 00:02:26
- Build a Classifier Using Random Forests 00:02:20
- Use a DataFrame to Compare Models 00:04:15
- Parquet as a Best Practice with DataFrames 00:00:58
- How to Store a DataFrame with Parquet 00:03:25
- How to Read a DataFrame Back in From Parquet 00:02:57
- Use SQL to Estimate Route Durations 00:01:41
- Data Preparation for GraphX - Model Route Costs 00:04:43
- Use PageRank to Rank Popular Stations 00:03:14
- Optimize Routes to Columbus Circle 00:03:43
- Compare Results with Google Maps 00:01:58
- Analyze a Popular Tourist Route 00:02:30
- Examples of How to Use DataFrames in Python 00:02:57
- Summary - The New DataFrames Features in Spark 00:01:03
Show and hide more 9781491919729.introduction.to.apache.OR.part1.rar
9781491919729.introduction.to.apache.OR.part2.rar
9781491919729.introduction.to.apache.OR.part3.rar