Oreilly - Data Wrangling and Analysis with Python
by Katharine Jarmul | Released June 2016 | ISBN: 9781491960813
Discover the data analysis capabilities of the Python Pandas software library in this introduction to data wrangling and data analytics. Designed for learners with some core knowledge of Python, you'll explore the basics of importing, exporting, parsing, cleaning, analyzing, and visualizing data.This course is an introduction to Pandas, where you'll learn to filter, group, match, and join data and then move on to advanced functions like analyzing trends and normalizing your data. There is also an introduction to some nifty skills like web scraping, working with API data, fuzzy matching, multiprocessing, and analyzing code performance.Master the basics of Python data wrangling and data analysisDiscover the Pandas software library and its use as a data analysis toolLearn to pull data from disparate sources (Excel, CSV, PDF, APIs, etc.)Explore web scraping and how to handle encoding and decodingUnderstand how to identify and clean data using RegEx and fuzzy matchingSample other data analysis tools like natural language processing and NumpyLearn the data visualization capabilities of Matplotlib and BokehKatharine Jarmul runs kjamistan UG, a Python consulting, training and competitive analysis company based in Berlin, Germany. She learned Python in 2008 while working at the Washington Post and is co-author of the O'Reilly title Data Wrangling with Python: Tips and Tools to Make Your Life Easier. Originally from Los Angeles, Jarmul earned an M.A. from American University and an M.S. from Pace University. Show and hide more Publisher resources Download Example Code
- Introduction
- Welcome To The Course 00:02:13
- About The Author 00:01:06
- Local Setup, What We'll Be Using 00:03:27
- Getting The Data
- Basic Files 00:04:56
- Excel Files 00:05:47
- PDF Files 00:04:00
- Using PDF Tables 00:06:17
- Streaming And Rest APIs: Twitter 00:10:21
- Using APIs Without Libraries 00:04:41
- Introduction To Web Scraping 00:03:36
- Building Your Own Web Scraper 00:06:44
- Python 2 vs Python 3 Encoding 00:06:17
- A Word On Encoding 00:06:33
- Data Analysis With Pandas
- Pandas Data Structures 00:08:06
- Pandas Data Types 00:04:23
- Filtering With Pandas 00:08:31
- Combining Datasets 00:06:25
- Joining Datasets 00:08:23
- Split-Apply-Combine 00:06:53
- Simple Statistics With Pandas 00:07:05
- Standardizing Your Data 00:06:58
- Normalizing Your Data 00:04:12
- Cleaning Your Data
- Identifying "Bad" Data 00:08:17
- Simple String Parsing With Regex 00:08:46
- Fuzzy Matching 00:04:55
- Storing Your Data (Local And Cloud) 00:06:51
- Pandas. More Advanced Functionality
- Identifying Trends 00:04:54
- Identifying Outliers 00:05:34
- Monitoring Speed/Performance 00:06:05
- Parallelizing 00:05:39
- Other Advanced Data Libraries
- Natural Language Processing 00:05:02
- Introduction To Numpy And Scipy 00:04:35
- Visualization With Matplotlib And Bokeh 00:05:16
- Conclusion
- Where To Go Next 00:03:28
Show and hide more