->
Oreilly - Learning PySpark - 9781788396592
Oreilly - Learning PySpark
by Tomasz Drabas | Released February 2018 | ISBN: 9781788396592


Building and deploying data-intensive applications at scale using Python and Apache SparkAbout This VideoPractical techniques to help you combine the power of Python and Apache Spark to process your data efficientlyOvercome any challenge when it comes to developing and deploying efficient, scalable, real-time Spark solutionsTake your understanding of using Spark with Python to the next level with this hands-on videoIn DetailApache Spark is an open-source distributed engine for querying and processing data. In this tutorial, we provide a brief overview of Spark and its stack. This tutorial presents effective, time-saving techniques on how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Apache Spark architecture and how to set up a Python environment for Spark.You'll learn about different techniques for collecting data, and distinguish between (and understand) techniques for processing data. Next, we provide an in-depth review of RDDs and contrast them with DataFrames. We provide examples of how to read data from files and from HDFS and how to specify schemas using reflection or programmatically (in the case of DataFrames). The concept of lazy execution is described and we outline various transformations and actions specific to RDDs and DataFrames.Finally, we show you how to use SQL to interact with DataFrames. By the end of this tutorial, you will have learned how to process data using Spark DataFrames and mastered data collection techniques by distributed data processing. Show and hide more Publisher Resources Download Example Code
  1. Chapter 1 : A Brief Primer on PySpark
    • The Course Overview 00:05:53
    • Brief Introduction to Spark 00:02:05
    • Apache Spark Stack 00:01:39
    • Spark Execution Process 00:01:26
    • Newest Capabilities of PySpark 2.0+ 00:01:56
    • Cloning GitHub Repository 00:01:56
  2. Chapter 2 : Resilient Distributed Datasets
    • Brief Introduction to RDDs 00:01:49
    • Creating RDDs 00:04:38
    • Schema of an RDD 00:02:17
    • Understanding Lazy Execution 00:02:11
    • Introducing Transformations – .map(…) 00:03:58
    • Introducing Transformations – .filter(…) 00:02:23
    • Introducing Transformations – .flatMap(…) 00:06:14
    • Introducing Transformations – .distinct(…) 00:03:27
    • Introducing Transformations – .sample(…) 00:03:15
    • Introducing Transformations – .join(…) 00:04:18
    • Introducing Transformations – .repartition(…) 00:04:17
  3. Chapter 3 : ChapterName
    • Introducing Actions – .take(…) 00:05:43
    • Introducing Actions – .collect(…) 00:02:15
    • Introducing Actions – .reduce(…) and .reduceByKey(…) 00:03:00
    • Introducing Actions – .count() 00:02:37
    • Introducing Actions – .foreach(…) 00:01:51
    • Introducing Actions – .aggregate(…) and .aggregateByKey(…) 00:04:55
    • Introducing Actions – .coalesce(…) 00:02:06
    • Introducing Actions – .combineByKey(…) 00:03:11
    • Introducing Actions – .histogram(…) 00:01:50
    • Introducing Actions – .sortBy(…) 00:02:39
    • Introducing Actions – Saving Data 00:03:11
    • Introducing Actions – Descriptive Statistics 00:02:14
  4. Chapter 4 : ChapterName
    • Introduction 00:01:42
    • Creating DataFrames 00:04:09
    • Specifying Schema of a DataFrame 00:06:00
    • Interacting with DataFrames 00:01:36
    • The .agg(…) Transformation 00:03:20
    • The .sql(…) Transformation 00:03:57
    • Creating Temporary Tables 00:02:31
    • Joining Two DataFrames 00:03:54
    • Performing Statistical Transformations 00:03:55
    • The .distinct(…) Transformation 00:01:30
  5. Chapter 5 : ChapterName
    • Schema Changes 00:06:29
    • Filtering Data 00:01:31
    • Aggregating Data 00:02:34
    • Selecting Data 00:02:24
    • Transforming Data 00:01:41
    • Presenting Data 00:01:34
    • Sorting DataFrames 00:01:01
    • Saving DataFrames 00:04:28
    • Pitfalls of UDFs 00:03:39
    • Repartitioning Data 00:01:59
  6. Show and hide more

    Oreilly - Learning PySpark


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Coktum   |  

Information
Members of Guests cannot leave comments.




rss