Oreilly - Hands-on NLP with NLTK and Scikit-learn
by Colibri Ltd | Released July 2018 | ISBN: 9781789345612
A complete Python guide to Natural Language Processing to build spam filters, topic classifiers, and sentiment analyzersAbout This VideoBuild actual solutions backed by machine learning and Natural Language Processing models, instead of meandering in theory and mathematical symbols.Single-handedly build three models, one for spam filtering, 0ne for sentiment analysis, and finally one for text classification.Get the right foundation from which to do applied, actual Natural Language Processing. We show you how to get open sourced data, wrangle text into Python data structures with NLTK, and predict different classes of natural language with scikit-learn.In DetailThere is an overflow of text data online nowadays. As a Python developer, you need to create a new solution using Natural Language Processing for your next project. Your colleagues depend on you to monetize gigabytes of unstructured text data. What do you do?Hands-on NLP with NLTK and scikit-learn is the answer. This course puts you right on the spot, starting off with building a spam classifier in our first video. At the end of the course, you are going to walk away with three NLP applications: a spam filter, a topic classifier, and a sentiment analyzer. There is no need for fancy mathematical theory, just plain English explanations of core NLP concepts and how to apply those using Python libraries.Taking this course will help you to precisely create new applications with Python and NLP. You will be able to build actual solutions backed by machine learning and NLP processing models with ease. Show and hide more
- Chapter 1 : Working with Natural Language Data
- The Course Overview 00:02:10
- Use Python, NLTK, spaCy, and Scikit-learn to Build Your NLP Toolset 00:06:41
- Reading a Simple Natural Language File into Memory 00:05:57
- Split the Text into Individual Words with Regular Expression 00:06:26
- Converting Words into Lists of Lower Case Tokens 00:04:01
- Removing Uncommon Words and Stop Words 00:06:35
- Chapter 2 : Spam Classification with an Email Dataset
- Use an Open Source Dataset, and What Is the Enron Dataset 00:05:08
- Loading the Enron Dataset into Memory 00:04:37
- Tokenization, Lemmatization, and Stop Word Removal 00:05:13
- Bag-of-Words Feature Extraction Process with Scikit-learn 00:04:46
- Basic Spam Classification with NLTK's Naive Bayes 00:06:51
- Chapter 3 : Sentiment Analysis with a Movie Review Dataset
- Understanding the Origin and Features of the Movie Review Dataset 00:06:02
- Loading and Cleaning the Review Data 00:05:18
- Preprocessing the Dataset to Remove Unwanted Words and Characters 00:06:27
- Creating TF-IDF Weighted Natural Language Features 00:05:02
- Basic Sentiment Analysis with Logistic Regression Model 00:06:23
- Chapter 4 : Boosting the Performance of Your Models with N-grams
- Deep Dive into Raw Tokens from the Movie Reviews 00:07:20
- Advanced Cleaning of Tokens Using Python String Functions and Regex 00:06:49
- Creating N-gram Features Using Scikit-learn 00:05:41
- Experimenting with Advanced Scikit-learn Models Using the NLTK Wrapper 00:04:34
- Building a Voting Model with Scikit-learn 00:04:04
- Chapter 5 : Document Classification with a Newsgroup Dataset
- Understanding the Origin and Features of the 20 Newsgroups Dataset 00:04:57
- Loading the Newsgroup Data and Extracting Features 00:04:46
- Building a Document Classification Pipeline 00:04:00
- Creating a Performance Report of the Model on the Test Set 00:05:11
- Finding Optimal Hyper-parameters Using Grid Search 00:06:12
- Chapter 6 : Advanced Topic Modelling with TF-IDF, LSA, and SVMs
- Building a Text Preprocessing Pipeline with NLTK 00:06:16
- Creating Hashing Based Features from Natural Language 00:06:53
- Classify Documents into 20 Topics with LSA 00:06:01
- Document Classification with TF-IDF and SVMs 00:06:11
Show and hide more