Big Data Processing And Machine Learning With Apache Spark

Last updated 4/2019MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHzLanguage: English | Size: 4.36 GB | Duration: 8h 54m

Leverage the power of Apache Spark to perform data processing, analytics, and machine learning on your data in real-

What you'll learn

Query your structured data using Spark SQL and work with the DataSets API

Uncover what RDDs (Resilient Distributed Datasets) are and how to perform operations on them

Train machine learning models with streaming data, and use them for making real- predictions

Implement high-velocity streaming and data processing use cases while working with streaming API

Dive into MLlib– the machine learning functional library in Spark with highly scalable algorithm

See analytical use case implementations using MLLib, GraphX, and Spark streaming

Examine a number of real-world use cases with hands-on projects

Build Hadoop and Apache Spark jobs that process data quickly and effectively

Requirements

Knowledge of Python programming is assumed but prior experience of working with Apache Spark is not required.

Description

Apache Spark is highly configurable and is gaining rapid popularity in the Big Data markets because of its in-memory data processing that makes it high-speed data processing ee. It also has well-built libraries for machine learning and graph analytics algorithms. This brings in Apache Spark to solve scalable machine learning problems and also work with high streaming real- data. If you want to get the most out of the trending Big Data framework for all your data processing and machine learning needs, then this course is for you.This course focuses on perfog data streaming, data analytics, and machine learning with Apache Spark. You will learn to load data from a variety of structured sources such as JSON, Hive, and Parquet using Spark SQL and schema RDDs. You will also build streaming applications and learn best practices for managing high-velocity streaming and external data sources. Next, you will explore Spark machine learning libraries and GraphX where you will perform graphical processing and analysis. Finally, you will build projects which will help you put your learnings into practice and get a stronghold of the topic.Contents and OverviewThis training program includes 4 complete courses, carefully chosen to give you the most comprehensive training possible.The first course, Apache Spark in 7 Days, is designed to give you a fundamental understanding of and hands-on experience in writing basic code as well as running applications on a Spark cluster. You will work on interesting examples and assignments that will demonstrate and help you understand basic operations, querying machine learning, and streaming.In the second course, Big Data Processing using Apache Spark, you will learn how to leverage Apache Spark to be able to process big data quickly. You will learn the basics of Spark API and its architecture in detail. You will then learn about Data Mining and Data Cleaning, wherein you will understand the Input Data Structure and how Input data is loaded. You will also write actual jobs that analyze data.The third course, Big Data Analytics Projects with Apache Spark, contains various projects that consist of real-world examples. The first project is to find top selling products for an e-commerce business by efficiently joining data sets in the paradigm. Next, a Market Basket Analysis will help you identify items likely to be purchased together and find correlations between items in a set of transactions. Moving on, you will learn about probabilistic logistic regression by finding an author for a post. Next, you will build a content-based recommendation system for movies to predict whether an action will happen, which you will do by building a trained model. Finally, you will use the MapReduce Spark program to calculate mutual friends on the social network.In the fourth course, Hands-On Machine Learning with Scala and Spark, you will go through day-to-day challenges that programmers face while implementing ML pipelines and consider different approaches and models to solve complex problems. You will learn about the most effective machine learning techniques and implement them in your favour. You will also implement algorithms with practical hands-on projects wherein you will build data models and understand how they work by using different types of algorithms.By the end of this course, you will be able to process large datasets, extract features from it, and apply a machine learning model that is well suited to your problem.Meet Your Expert(s):We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:Karen Yang has been a passionate self-learner in computer science for over 6 years. She has programming, big data processing, and eeering experience. Her recent interests include cloud computing. She previously taught for 5 years in a college evening adult program.Tomasz Lelek is a Software Eeer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his and effort to get better at everything. He is currently diving into Big Data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.

Overview

Section 1: Apache Spark in 7 Days

Lecture 1 The Course Overview

Lecture 2 Setting Up an AWS Account

Lecture 3 Launching a Spark Cluster on EC2

Lecture 4 Setting Up Your Environment

Lecture 5 Running a Test Application

Lecture 6 Creating RDDs

Lecture 7 Actions

Lecture 8 Transformations

Lecture 9 Joins, Set, and Numeric Operations

Lecture 10 Shared Variables

Lecture 11 Installing Jupyter Notebook

Lecture 12 RDDs and DataFrames

Lecture 13 DataFrame Row Operations

Lecture 14 DataFrame Column Operations

Lecture 15 DataFrame Manipulation

Lecture 16 Views

Lecture 17 Schemas

Lecture 18 SQL Operations

Lecture 19 I/O Options

Lecture 20 HIVE

Lecture 21 Basic Statistics

Lecture 22 Pipelines

Lecture 23 Feature Extractors

Lecture 24 Feature Transformers

Lecture 25 Feature Selectors

Lecture 26 Classification

Lecture 27 Regression

Lecture 28 Clustering

Lecture 29 Collaborative Filtering

Lecture 30 Model Selection and Tuning

Lecture 31 DStreams

Lecture 32 DStream Window Operations

Lecture 33 Structured Streaming

Lecture 34 Window Operations

Lecture 35 Joining Batch and Streaming Data

Section 2: Big Data Processing using Apache Spark

Lecture 36 The Course Overview

Lecture 37 Overview of the Apache Spark and Its Architecture

Lecture 38 Start a Project Using Apache Spark, Look at build.sbt

Lecture 39 Creating the Spark Context

Lecture 40 Looking at API of Spark

Lecture 41 Looking at the Input Data Structure

Lecture 42 Using RDD API in the Data Mining Process

Lecture 43 Loading Input Data

Lecture 44 Cleaning Input Data

Lecture 45 Logic for Counting Words

Lecture 46 Using RDD API Transformations and Actions to Solve a Problem

Lecture 47 Testing Spark Job

Lecture 48 Summary of Data Processing

Section 3: Big Data Analytics Projects with Apache Spark

Lecture 49 The Course Overview

Lecture 50 Explaining Ways of Joining Datasets

Lecture 51 Developing Spark Algorithm for Joining/Windowing Datasets

Lecture 52 Testing Logic in MapReduce Spark — Finding Top Sellers

Lecture 53 Drawing Conclusions from Top Sellers Data

Lecture 54 Market Basket Analysis Goals

Lecture 55 Where MBA Algorithms Are Useful?

Lecture 56 Implementing MBA MapReduce Algorithm in Spark

Lecture 57 Finding Association Rules Between Products

Lecture 58 Analyzing Post for an Author

Lecture 59 Extracting Information from Unstructured Text

Lecture 60 Extracting Information via Spark DataFrame

Lecture 61 Sennt Analysis of Posts Using Logistic Regression

Lecture 62 Finding an Author of a Post

Lecture 63 Content-Based Recommendation Systems Explanation

Lecture 64 Finding Correlation Between Movies and Users

Lecture 65 Testing Logic in MapReduce Spark

Lecture 66 Finding Recommendation for Given User

Lecture 67 Finding Common Friends Problem — Graph Approach

Lecture 68 Creating a Graph Using GraphX and Property Graph

Lecture 69 Solution — Examining Available Methods

Lecture 70 Finding Closest Friend for Given User Using Page Rank

Section 4: Hands-On Machine Learning with Scala and Spark

Lecture 71 The Course Overview

Lecture 72 Analyzing Text Input Data

Lecture 73 Feature Generation from Text – Count Vectorizer, TFIDF, LDA

Lecture 74 Extracting Features from Data – Transfog Text into Vector of Numbers

Lecture 75 Bag-of-Words and Skip Gram

Lecture 76 Training Classification Models – Implementing Word2Vect Using Apache Spark

Lecture 77 Logistic Regression Explanation

Lecture 78 Writing a Logistic Regression Model Per Author in Apache Spark

Lecture 79 Training Regression Model

Lecture 80 Key Concepts, Machine Learning Pipelines, and Operations

Lecture 81 Learn How to Validate Models Using Cross-Validation

Lecture 82 Analyzing of Post Using Clustering – (GMM Explanation)

Lecture 83 Implementing GMM in Apache Spark

Lecture 84 K-Means Clustering Explanation and Use Cases

Lecture 85 Implementing K-Means Clustering in Apache Spark

Lecture 86 Measure Accuracy Using Area Under ROC

Lecture 87 Dimensionality Reduction Using Singular Value Decomposition (SVD)

Lecture 88 Building Recommendation Ee in Spark Using Collaborative Filtering

Lecture 89 Using Recommendation Ee to Get Top Recommendations

Lecture 90 Dense and Sparse Vectors

Lecture 91 LabeledPoints, Rating, and Other Data Types

Lecture 92 The Spark versus Deep Learning Use Case

Lecture 93 Spark for Parallelizing Deep Learning Evaluation

Lecture 94 Deep Learning As a Feature Generator for Existing Spark ML Algorithms

Lecture 95 Spark/Deep Learning Made Simple

This course will be particularly useful if you are a developer, data analyst, data eeer, or data scientist. However, anyone interested in learning how to use Spark will also benefit from this course.

HomePage:

https://www.udemy.com/course/big-data-processing-and-machine-learning-with-apache-spark/

gfxtra__Big_Data_P.part1.rar.html

gfxtra__Big_Data_P.part2.rar.html

gfxtra__Big_Data_P.part3.rar.html

gfxtra__Big_Data_P.part4.rar.html

gfxtra__Big_Data_P.part5.rar.html

Top Rated News