Oreilly - Apache Spark 2 for Beginners
by Rajanarayanan Thottuvaikkatumana | Released December 2016 | ISBN: 9781787281004
Take your first steps in developing large-scale distributed data processing applications using Apache Spark 2About This VideoGet introduced to the recently released Apache Spark 2 frameworkLeverage the capabilities of various Spark components to perform efficient data processing, machine learning and graph processingA practical tutorial aimed at absolute beginners to get them up and running with Apache SparkIn DetailSpark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced through real-world examples followed by Spark SQL programming with DataFrames. An introduction to SparkR is covered next. Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark's stream processing, machine learning, and graph processing libraries. The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.By the end of this video, you will be able to consolidate data processing, stream processing, machine learning, and graph processing into one unified and highly interoperable framework with a uniform API using Scala or Python. Show and hide more
- Chapter 1 : Spark Fundamentals
- The Course Overview 00:04:30
- An Overview of Apache Hadoop 00:05:50
- Understanding Apache Spark 00:05:14
- Installing Spark on Your Machines 00:13:49
- Chapter 2 : Spark Programming Model
- Functional Programming with Spark and Understanding Spark RDD 00:08:45
- Data Transformations and Actions with RDDs 00:05:22
- Monitoring with Spark 00:04:02
- The Basics of Programming with Spark 00:20:30
- Creating RDDs from Files and Understanding the Spark Library Stack 00:06:39
- Chapter 3 : Spark SQL
- Understanding the Structure of Data and the Need of Spark SQL 00:09:39
- Anatomy of Spark SQL 00:05:09
- DataFrame Programming 00:12:01
- Understanding Aggregations and Multi-Datasource Joining with SparkSQL 00:08:33
- Introducing Datasets and Understanding Data Catalogs 00:07:53
- Chapter 4 : Spark Programming with R
- The Need for Spark and the Basics of the R Language 00:08:09
- DataFrames in R and Spark 00:02:57
- Spark DataFrame Programming with R 00:04:43
- Understanding Aggregations and Multi- Datasource Joins in SparkR 00:04:12
- Chapter 5 : Spark Data Analysis with Python
- Charting and Plotting Libraries and Setting Up a Dataset 00:04:00
- Charts, Plots, and Histograms 00:05:36
- Bar Chart and Pie Chart 00:07:46
- Scatter Plot and Line Graph 00:04:53
- Chapter 6 : Spark Stream Processing
- Data Stream Processing and Micro Batch Data Processing 00:08:36
- A Log Event Processor 00:16:22
- Windowed Data Processing and More Processing Options 00:07:27
- Kafka Stream Processing 00:10:44
- Spark Streaming Jobs in Production 00:09:09
- Chapter 7 : Spark Machine Learning
- Understanding Machine Learning and the Need of Spark for it 00:06:22
- Wine Quality Prediction and Model Persistence 00:10:44
- Wine Classification 00:05:58
- Spam Filtering 00:07:08
- Feature Algorithms and Finding Synonyms 00:06:54
- Chapter 8 : Spark Graph Processing
- Understanding Graphs with Their Usage 00:04:35
- The Spark GraphX Library 00:10:09
- Graph Processing and Graph Structure Processing 00:09:45
- Tennis Tournament Analysis 00:05:34
- Applying PageRank Algorithm 00:03:30
- Connected Component Algorithm 00:04:39
- Understanding GraphFrames and Its Queries 00:09:31
- Chapter 9 : Designing Spark Applications
- Lambda Architecture 00:04:47
- Micro Blogging with Lambda Architecture 00:07:13
- Implementing Lambda Architecture and Working with Spark Applications 00:08:19
- Coding Style, Setting Up the Source Code, and Understanding Data Ingestion 00:09:09
- Generating Purposed Views and Queries 00:05:53
- Understanding Custom Data Processes 00:06:12
Show and hide more