Oreilly - Web Scraping Using Python
by Abhishek Chhibber | Released April 2018 | ISBN: None
This video covers numerous methods of extracting information from web pages using Python. There are 12 videos within this series: About the Course. This first video in the series is an introductory video and discusses the topics which will be covered in this course. About Web Scraping. This second video in the series lays the foundation of web scraping by defining it, and differentiating it with other terms such as web crawling and internet bots. It also discusses checking and understanding the websites' terms of use, as well as robots.txt files – in order to scrape the web in a legal and ethical way. Web Page Structure. This third video in the series explains the layout of HTML pages. It starts with learning more about websites' ownership details and the technology stack. It talks about the Document Object Model, the parent and child elements of HTML, and the CSS classes. The video also covers using XPaths and CSS Selectors, to find the elements in an HTML file. About Beautiful Soup. This fourth video in the series discusses two main Python libraries for web scraping – Requests and Beautiful Soup. The video provides a detailed understanding of fetching HTML pages using the above-mentioned libraries, and then cleaning and slicing them to scrape data. The video also discusses the installation of Python as well as the Anaconda Notebook. Scraping Blog Posts. This fifth video in the series shows how to scrape blog posts and product review sites. It covers scraping data from a list of posts, analyzing it as a table, and then exporting it as a CSV file. The video also discusses the use of regular expression in web scraping, and introduces several Python libraries, such as Pandas, CSV, and RE. Scraping Ecommerce Sites. This sixth video in the series shows how to scrape a web page of an ecommerce website. The code reads all of the products on a webpage, extracts the relevant information, and then exports it to a CSV file. Recursive Web Scraping. This seventh video in the series covers recursive web scraping, where we have to scrape several pages of a website. It discusses the use of Time and Random libraries, to time the web scraping. The video further talks about using a While loop to scrape websites, where we do not know the number of pages to scrape. Web Scraping with Selenium. This eighth video in the series focuses on automating the browser-controlled web scrapping, using the web browser automation tool, Selenium. The video talks about controlling web browser behavior, filling forms, and extracting the data using Selenium. Scraping XML Files. This ninth video in the series covers reading and scraping content from XML files. The video talks about how to scrape sitemaps, as well as other XML files, run queries on the same, and export their data. Scraping Map Data. This tenth video in the series covers reading data from a map, exported as an XML file. The video talks about reading the map data, and then exploring it further to extract and analyze different amenities in the area. Scraping Data through APIs. This eleventh video in the series covers fetching data through APIs. The video discusses two different APIs to fetch data and analyze data. It also covers scraping and analyzing data using RSS feeds. Fetching Emails through Python. This twelfth video in the series covers fetching email from Gmail, using Python. The video discusses using the Google's Gmail API, to access the inbox, get message details, and read the messages. It also covers modifying messages from unread to read, by changing their label Ids. Show and hide more
- About the Course 00:06:49
- About Web Scraping 00:09:14
- Web Page Structure 00:14:40
- About Beautiful Soup 00:28:09
- Scraping Blog Posts 00:17:53
- Scraping Ecommerce Sites 00:14:14
- Recursive Web Scraping 00:15:03
- Web Scraping with Selenium 00:28:11
- Scraping XML Files 00:14:05
- Scraping Map Data 00:16:26
- Scraping Data through APIs 00:16:09
- Fetching Emails through Python 00:22:57
Show and hide more