MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 385.61 MB | Duration: 0h 56m
Hands-on training to analyze and prepare data for Machine Learning using Pandas, Pyspark and SQL
What you'll learn
Get hands-on experience with the data preprocessing
Understand the practical differences between tools such as Pandas and Pyspark
Understand when to use Pandas vs PySpark
Understand the exploration steps required for Data Science and Machine Learning
Requirements
Having a laptop or system to develop and execute code to learn
Introduction to Programming and Python
Description
Exploring and preparing data is a huge step in the Machine Learning and Data Science lifecycle as I've already mentioned in my other course "Applied ML: The Big Picture". Being such a crucial foundational step in the lifecycle, it's important to learn all the tools at your disposal and get a practical understanding on when to choose which tool.This course will teach the hands-on techniques to perform several stages in data processing, exploration and transformation, alongside visualization. It will also expose the learner to various scenarios, helping them differentiate and choose between the tools in the real world projects.Within each tool, we will cover a variety of techniques and their specific purpose in data analysis and manipulation on real datasets. Those who wish to learn by practice will require a system with Python development environment to get hands-on training. For someone who has already had the practice, this course can serve as a refresher on the various tools and techniques, to make sure you are using the right combination of tools and techniques for the given problem at hand. And likewise, be extended to interview preparations to refresh memory on best data practices for ML and Data Science in Python.
Overview
Section 1: Introduction
Lecture 1 Introduction to Instructor and Course
Lecture 2 Scope of the Course and Development Environment
Section 2: Loading and inspecting data
Lecture 3 Pandas and Pyspark libraries
Lecture 4 Load and inspect data
Section 3: Cleaning data
Lecture 5 Filter out null and duplicate values
Lecture 6 Filter out malformed entries
Section 4: Discovering stories to be told
Lecture 7 Importance of human and domain knowledge
Lecture 8 Analyze domain specific themes
Developers and Analysts curious about the various data maniputation, transformation and analytics tools available in Python
TO MAC USERS: If RAR password doesn't work, use this archive program:
RAR Expander 0.8.5 Beta 4 and extract password protected files without error.
TO WIN USERS: If RAR password doesn't work, use this archive program:
Latest Winrar and extract password protected files without error.