Oreilly - Big Data Analytics Projects with Apache Spark
by Tomasz Lelek | Released June 2018 | ISBN: 9781789132373
Perform real-life data operations with Apache Spark.About This VideoExplore and analyze large volumes of data effectively by combining the power of Big Data processing tools such as Hadoop and Spark.Work with different kinds of data and implement various probabilistic models.Learn the best use cases, identify the problem areas, and solved meet them with the help of the right data science techniques and methods for your projects.In DetailReady to use statistical and machine-learning techniques across large data sets? This course shows you how the Apache Spark and the Hadoop MapReduce ecosystem is perfect for the job.This course contains various projects that consist of real-world examples. The first project is to find top selling products for an e-commerce business by efficiently joining data sets in the Map/Reduce paradigm. Next, a Market Basket Analysis will help you identify items likely to be purchased together and find correlations between items in a set of transactions.Moving on, you'll learn about probabilistic logistic regression by finding an author for a post. Next, you'll build a content-based recommendation system for movies to predict whether an action will happen, which we'll do by building a trained model. Finally, we'll use the Map/Reduce Spark program to calculate mutual friends on social network.By the end of this course, you'll have been exposed to a wide variety of mathematical techniques that can be utilized as training models with the Spark and Hadoop software, and know how to solve common problems. Show and hide more
- Chapter 1 : Finding Top Selling Product
- The Course Overview 00:02:12
- Explaining Ways of Joining Datasets 00:07:56
- Developing Spark Algorithm for Joining/Windowing Datasets 00:11:05
- Testing Logic in MapReduce Spark — Finding Top Sellers 00:03:56
- Drawing Conclusions from Top Sellers Data 00:06:42
- Chapter 2 : Market Basket Analysis
- Market Basket Analysis Goals 00:04:25
- Where MBA Algorithms Are Useful? 00:03:46
- Implementing MBA MapReduce Algorithm in Spark 00:08:15
- Finding Association Rules Between Products 00:06:55
- Chapter 3 : Finding an Author Using Probabilistic Logistic Regression
- Analyzing Post for an Author 00:02:38
- Extracting Information from Unstructured Text 00:04:36
- Extracting Information via Spark DataFrame 00:05:20
- Sentiment Analysis of Posts Using Logistic Regression 00:05:24
- Finding an Author of a Post 00:03:03
- Chapter 4 : Content-Based Recommendation System: Movies
- Content-Based Recommendation Systems Explanation 00:04:42
- Finding Correlation Between Movies and Users 00:04:14
- Testing Logic in MapReduce Spark 00:07:55
- Finding Recommendation for Given User 00:05:21
- Chapter 5 : Social Network Friend Recommendation
- Finding Common Friends Problem — Graph Approach 00:03:52
- Creating a Graph Using GraphX and Property Graph 00:09:32
- Solution — Examining Available Methods 00:04:16
- Finding st Friend for Given User Using Page Rank 00:07:38
Show and hide more
TO MAC USERS: If RAR password doesn't work, use this archive program:
RAR Expander 0.8.5 Beta 4 and extract password protected files without error.
TO WIN USERS: If RAR password doesn't work, use this archive program:
Latest Winrar and extract password protected files without error.