->
Oreilly - Hands-On PySpark for Big Data Analysis - 9781789530056
Oreilly - Hands-On PySpark for Big Data Analysis
by Rudy Lai | Released December 2018 | ISBN: 9781789530056


Use PySpark to productionize analytics over Big Data and easily crush messy data at scale About This VideoWork with large amounts of data with agility using distributed datasets and in-memory cachingSource data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3Deploy Big Data analytics to production using PySpark's easy to use APIIn DetailData is an incredible asset, especially when there are lots of it. Exploratory data analysis, business intelligence, and machine learning all depend on processing and analyzing Big Data at scale. How do you go from working on prototypes on your local machine, to handling messy data in production and at scale? This is a practical, hands-on course that shows you how to use Spark and it's Python API to create performant analytics with large-scale data. Don't reinvent the wheel, and wow your clients by building robust and responsible applications on Big Data.All the code and supporting files for this course are available on Github at - https://github.com/PacktPublishing/Hands-On-Pyspark-for-Big-Data-Analysis Show and hide more
  1. Chapter 1 : Install PySpark and Setup Your Development Environment
    • The Course Overview 00:03:03
    • Core Concepts in Spark and PySpark 00:09:06
    • Setting Up Spark on Windows and PySpark 00:07:51
    • SparkContext, SparkConf and Spark Shell 00:09:59
  2. Chapter 2 : Getting Your Big Data into the Spark Environment Using RDDs
    • Loading Data onto Spark RDDs 00:05:02
    • Parallelization with Spark RDDs 00:06:34
    • RDD Operation Basics 00:08:17
  3. Chapter 3 : Big Data Cleaning and Wrangling with Spark Notebooks
    • Using Spark Notebooks for Quick Iteration of Ideas 00:06:45
    • Sampling/Filtering RDDs to Pick-Out Relevant Data Points 00:07:01
    • Splitting Datasets and Creating New Combinations with Set Operations 00:05:10
  4. Chapter 4 : Aggregating and Summarizing Data into Useful Reports
    • Calculating Averages with Map and Reduce 00:05:27
    • Faster Average Computation with Aggregate 00:06:22
    • Pivot Tabling with Key-Value Paired Data Points 00:05:21
  5. Chapter 5 : Powerful Exploratory Data Analysis with MLlib
    • Computing Summary Statistics with MLlib 00:05:56
    • Using Pearson and Spearman to Discover Correlations 00:06:14
    • Testing Your Hypotheses on Large Datasets 00:05:22
  6. Chapter 6 : Putting Structure on Your Big Data with SparkSQL
    • Manipulating DataFrames with SparkSQL Schemas 00:05:04
    • Using the Spark DSL to Build Queries for Structured Data Operations 00:04:18
  7. Show and hide more

    Oreilly - Hands-On PySpark for Big Data Analysis


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Coktum   |  

Information
Members of Guests cannot leave comments.




rss