Oreilly - Study Guide for the Developer Certification for Apache Spark
by | Released November 2015 | ISBN: 9781771374088
In this Study Guide for the Developer Certification for Apache Spark training course, expert author Olivier Girardot will teach you everything you need to know to prepare for and pass the Developer Certification for Apache Spark. This course is designed for users that are already familiar with Python, Java, and Scala.You will start by learning about Apache Spark best practices, including transformations, actions, and joins. From there, Olivier will teach you about closure serialization, shared variables and performance, and Spark SQL. This video tutorial also covers Spark MLLib, Spark GraphX, and Spark streaming. Finally, you will learn about deployment and infrastructure.Once you have completed this computer based training course, you will have learned the knowledge necessary to prepare for and pass the Spark Certification Exam. Working files are included, allowing you to follow along with the author throughout the lessons. Show and hide more Publisher resources Download Example Code
- Introduction
- Introduction and Course Overview 00:04:10
- About the Author 00:00:35
- Spark’s concepts and approach 00:06:04
- Resilient Distributed Databases (RDD) 00:05:03
- Creating a Project in IDEA 00:02:54
- Spark Core API & Best practices
- Base RDD 00:06:46
- Transformations 00:05:35
- Actions - Part 1 00:01:40
- Actions - Part 2 00:02:42
- Hadoop Combiners In Spark 00:04:52
- Direct Acyclic Graph And Lazy Evaluation 00:07:20
- Joins 00:06:15
- Closure serialization
- How does the magic of Spark works 00:07:30
- Serializers and how to change them 00:04:10
- Shared variables and performance
- Broadcast 00:04:07
- Accumulators 00:05:05
- Caching & Persistence 00:09:22
- Spark SQL
- Spark SQL 00:12:32
- Inferring A Schema 00:07:38
- Applying A Schema 00:06:27
- Loading And Writing 00:06:07
- SQL Caching And UDF 00:08:48
- Spark MLLib
- Spark MLLib And Supervised Example - SVM 00:10:02
- Unsupervised With Iris Dataset - KMeans 00:08:54
- Spark GraphX
- Graph Construction 00:07:06
- Graph Algorithms 00:06:52
- Spark Streaming
- Streaming And The Microbatch 00:13:57
- Mutable Transformations And Checkpointing 00:09:07
- Windows And RDD Transformations 00:08:43
- Streaming With Spark SQL, MLLib And Core 00:12:28
- Deployment and Infrastructure
- Cluster Managers And Submission - Standalone, Mesos And Yarn 00:13:20
- Conclusion
- Resources And Where To Go From Here 00:04:06
Show and hide more