->
Oreilly - Data Science with Spark - 9781786467935
Oreilly - Data Science with Spark
by Eric Charles | Released January 2017 | ISBN: 9781786467935


Get started with Spark for data science using this unique video tutorial About This VideoExplore various facets of data science with Spark using this example-rich videoLearn how to tell a compelling story in data science using Spark's eco-systemGet up and running with Apache Spark and clean, analyze, and visualize data with easeIn DetailThe real power and value proposition of Apache Spark is its speed and platform to execute Data Science tasks. Spark's unique use case is that it combines ETL, batch analytic, real-time stream analysis, machine learning, graph processing, and visualizations to allow Data Scientists to tackle the complexities that come with raw unstructured data sets. Spark embraces this approach and has the vision to make the transition from working on a single machine to working on a cluster, something that makes data science tasks a lot more agile.In this course, you'll get a hands-on technical resource that will enable you to become comfortable and confident working with Spark for Data Science. We won't just explore Spark's Data Science libraries, we'll dive deeper and expand on the topics.This course starts by taking you through Spark and the needed steps to build machine learning applications. You will learn to collect, clean, and visualize data coming from Twitter with Spark streaming. Then, you will get acquainted with Spark Machine learning algorithms and different machine learning techniques. You will also learn to apply statistical analysis and mining operations on our Tweet dataset. Finally, the course will end by giving you some ideas on how to perform awesome analysis including graph processing. By the end of the course, you will be able to do your Data scientist job in a very visual way, comprehensive and appealing for business and other stakeholders. Show and hide more
  1. Chapter 1 : Your Spark and Visualization Toolkit
    • The Course Overview 00:03:55
    • Spark: Origins and Ecosystem for Big Data Scientists, the Scala, Python, and R flavors 00:04:41
    • Install Spark on Your Laptop with Docker, or Scale Fast in the Cloud 00:04:41
    • Apache Zeppelin, a Web-Based Notebook for Spark with matplotlib and ggplot2 00:03:08
  2. Chapter 2 : First Steps with Spark Visualization
    • Manipulating Data with the Core RDD API 00:08:16
    • Using Dataframe, Dataset, and SQL – Natural and Easy! 00:06:36
    • Manipulating Rows and Columns 00:04:50
    • Dealing with File Format 00:02:17
    • Visualizing More – ggplot2, matplotlib, and Angular.js at the Rescue 00:03:32
  3. Chapter 3 : The Spark Machine Learning Algorithms
    • Discovering spark.ml and spark.mllib - and Other Libraries 00:08:02
    • Wrapping Up Basic Statistics and Linear Algebra 00:09:58
    • Cleansing Data and Engineering the Features 00:05:04
    • Reducing the Dimensionality 00:04:09
    • Pipeline for a Life 00:03:58
  4. Chapter 4 : Collecting and Cleansing the Dirty Tweets
    • Streaming Tweets to Disk 00:05:37
    • Streaming Tweets on a Map 00:04:05
    • Cleansing and Building Your Reference Dataset 00:05:13
    • Querying and Visualizing Tweets with SQL 00:04:16
  5. Chapter 5 : Statistical Analysis on Tweets
    • Indicators, Correlations, and Sampling 00:07:17
    • Validating Statistical Relevance 00:03:32
    • Running SVD and PCA 00:04:04
    • Extending the Basic Statistics for Your Needs 00:04:19
  6. Chapter 6 : Extracting Features from the Tweets
    • Analyzing Free Text from the Tweets 00:07:23
    • Dealing with Stemming, Syntax, Idioms and Hashtags 00:05:24
    • Detecting Tweet Sentiment 00:03:28
    • Identifying Topics with LDA 00:03:06
  7. Chapter 7 : Mine Data and Share Results
    • Word Cloudify Your Dataset 00:05:31
    • Locating Users and Displaying Heatmaps with GeoHash 00:04:15
    • Collaborating on the Same Note with Peers 00:04:57
    • Create Visual Dashboards for Your Business Stakeholders 00:03:56
  8. Chapter 8 : Classifying the Tweets
    • Building the Training and Test Datasets 00:07:25
    • Training a Logistic Regression Model 00:03:55
    • Evaluating Your Classifier 00:05:32
    • Selecting Your Model 00:05:19
  9. Chapter 9 : Clustering Users
    • Clustering Users by Followers and Friends 00:05:12
    • Clustering Users by Location 00:02:48
    • Running KMeans on a Stream 00:02:30
  10. Chapter 10 : Your Next Data Challenges
    • Recommending Similar Users 00:05:11
    • Analyzing Mentions with GraphX 00:06:22
    • Where to Go from Here 00:06:21
  11. Show and hide more

    Oreilly - Data Science with Spark


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Coktum   |  

Information
Members of Guests cannot leave comments.




rss