Oreilly - Data Analytics Using Spark and Hadoop
by Sujee Maniyam | Released October 2016 | ISBN: 9781491963159
Hadoop and Spark are the stars of the Big Data world. This course covers the basics of Spark and how to use Spark and Hadoop together for big data analytics. Designed for developers, architects, and data analysts with a fundamental understanding of Hadoop, it begins with an overview of how Hadoop and Spark are used in today's big data ecosystem before moving into hands-on labs that demonstrate Spark and Spark-Hadoop integration.You'll learn about the Spark shell, RDDs, and DataFrames; how to query data in Hadoop Hive Tables from Spark; and how to develop Spark applications and run them on YARN. Discover how to integrate the Hadoop and Spark big data analytics platforms Get access to 11 hands-on labs demonstrating the core aspects of Hadoop-Spark integration Learn the basics of the Spark framework: Spark shell, RDDs and DataFrames Explore methods for analyzing data in Hadoop HDFS and Hive using Spark Gain an understanding on how to write Spark applications and run them on YARNSujee Maniyam is the co-founder of Elephant Scale, a Big Data training company specializing in Hadoop, NoSQL, and data science. An open-source author/developer since 2000, Sujee ran the analytics company CoverCake for five years, founded the Santa Clara Big Data Guru Meet-Up, developed a Hadoop course for Intel, worked as a software engineer for IBM for six years, and is co-author of the O'Reilly title HBase Design Patterns. He earned a Bachelor of Science in Computer Engineering from the University of Melbourne and holds certifications in both Hadoop and Spark. Show and hide more Publisher resources Download Example Code
- Introduction
- Course Intro And What To Expect 00:01:29
- About The Author 00:00:39
- Getting Started
- Big Data Eco System Overview 00:05:09
- What Is Spark 00:04:49
- Spark Vs. Hadoop 00:06:41
- Setting Up The Environment 00:06:10
- Setting Up Data In Hadoop Exercise Lab 00:11:40
- Spark
- Spark And Spark Shell Overview 00:03:32
- Spark Shell Labs 00:07:10
- RDD Overview 00:08:47
- RDD Labs 00:06:00
- DataFrames 00:06:25
- DataFrames Lab Part 1 00:08:03
- DataFrames Lab Part 2 00:03:47
- Spark And Hive
- Hive Lab Part 1 00:03:09
- Hive Lab Part 2 00:03:44
- Spark YARN
- Spark And YARN Lab 00:05:15
- Spark Applications 00:03:53
- Spark Applications Lab 1 Part 1 00:05:39
- Spark Applications Lab 1 Part 2 00:02:41
- Spark Applications Lab 2 00:05:44
- Conclusion
- Wrap Up And Thank You 00:02:07
Show and hide more