->
Oreilly - Using R for Big Data with Spark - 9781491973035
Oreilly - Using R for Big Data with Spark
by Manuel Amunategui | Released October 2016 | ISBN: 9781491973028


Data analysts familiar with R will learn to leverage the power of Spark, distributed computing and cloud storage in this course that shows you how to use your R skills in a big data environment.You'll learn to create Spark clusters on the Amazon Web Services (AWS) platform; perform cluster based data modeling using Gaussian generalized linear models, binomial generalized linear models, Naive Bayes, and K-means modeling; access data from S3 Spark DataFrames and other formats like CSV, Json, and HDFS; and do cluster based data manipulation operations with tools like SparkR and SparkSQL. By course end, you'll be capable of working with massive data sets not possible on a single computer. This hands-on class requires each learner to set-up their own extremely low-cost, easily terminated AWS account. Discover how to use your R skills in a big data distributed cloud computing cluster environment Gain hands-on experience setting up Spark clusters on Amazon's AWS cloud services platform Understand how to control a cloud instance on AWS using SSH or PuTTY Explore basic distributed modeling techniques like GLM, Naive Bayes, and K-means Learn to do cloud based data manipulation and processing using SparkR and SparkSQL Understand how to access data from the CSV, Json, HDFS, and S3 formatsManuel Amunategui is a data science practitioner, consultant, teacher, and author with 16+ years of data science experience. A former quantitative analyst for a Wall Street brokerage firm, he now serves as the lead data scientist for Providence Health & Services in Portland, Oregon. In his free time, Manuel does competitive data modeling on Kaggle.com, CrowdANALYTIX.com, Datascience.net, and DrivenData.org. Show and hide more Publisher resources Download Example Code
  1. Introduction
    • Welcome to the Course 00:04:21
    • About the Author 00:01:09
  2. Creating Clusters on Amazon Web Services
    • Creating an AWS Launching Instance 00:09:40
    • Connecting to AWS Instance using SSH 00:06:19
    • Connecting to AWS Instance using PuTTY 00:08:37
    • Starting Spark Clusters Part 1 00:09:02
    • Starting Spark Clusters Part 2 00:09:55
    • Terminate Your Clusters 00:00:58
  3. Data and Modeling Basics
    • Data Basics 00:08:34
    • Modeling with Gaussian Generalized Linear Models 00:11:19
    • Modeling with Binomial Generalized Linear Models 00:09:34
    • Naive Bayes and K-Means Modeling 00:09:14
  4. Data Sources and Data Manipulation
    • Bigger Data and S3 00:07:27
    • Accessing S3 Spark Dataframes 00:04:57
    • SparkR Dataframe Operations 00:11:01
    • SparkSQL 00:05:16
  5. Various
    • Brief Look at HDFS 00:11:00
    • Brief Look at Databricks Community Edition 00:08:20
  6. Conclusion
    • Wrap Up and Thank You 00:02:02
  7. Show and hide more

    Oreilly - Using R for Big Data with Spark


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Coktum   |  

Information
Members of Guests cannot leave comments.




rss