Oreilly - Text Processing using NLTK in Python
by Krishna Bhavsar, Naresh Kumar, Pratap Dangeti | Released April 2018 | ISBN: 9781789348989
Learn the tricks and tips that will help you design Text Analytics solutionsAbout This VideoIndependent solutions that will teach you how to efficiently perform Natural Language Processing in PythonUse dictionaries to create your own named entities using this easy-to-follow guideLearn how to implement NLTK for various scenarios with the help of example-rich solutions to take you beyond basic Natural Language ProcessingIn DetailNatural Language Processing (NLP) is a feature of Artificial Intelligence concerned with the interactions between computers and human (natural) languages. This course includes unique videos that will teach you various aspects of performing Natural Language Processing with NLTK—the leading Python platform for the task.In this course, you will learn what WordNet is and explore its features and usage. It will teach how to extract raw text from web sources and introduce some critical pre-processing steps. You will also get familiarized with the concept of pattern matching as a way to do text analysis.By the end of the course, you will be confident & have covered various solutions, covering natural language understanding, Natural Language Processing, and syntactic analysis.All the code and supporting files for this course are available on Github at https://github.com/PacktPublishing/Text-Processing-using-NLTK-in-Python Show and hide more
- Chapter 1 : Corpus and WordNet
- The Course Overview 00:03:23
- Accessing In-Built Corpora 00:04:07
- Downloading an External Corpus 00:03:33
- Counting All the wh-words 00:03:43
- Frequency Distribution Operations 00:02:40
- WordNet 00:03:10
- The Concepts of Hyponyms and Hypernyms Using WordNet 00:03:40
- Compute the Average Polysemy According to WordNet 00:03:29
- Chapter 2 : Raw Text, Sourcing, and Normalization
- The Importance of String Operations 00:03:10
- Getting Deeper with String Operations 00:02:58
- Reading a PDF File in Python 00:02:54
- Reading Word Documents in Python 00:03:56
- Creating a User-Defined Corpus 00:04:30
- Reading Contents from an RSS Feed 00:02:50
- HTML Parsing Using BeautifulSoup 00:03:50
- Chapter 3 : Pre-Processing
- Tokenization – Learning to Use the Inbuilt Tokenizers of NLTK 00:02:52
- Stemming – Learning to Use the Inbuilt Stemmers of NLTK 00:02:28
- Lemmatization – Learning to Use the WordNetLemmatizer of NLTK 00:02:20
- Stopwords – Learning to Use the Stopwords Corpus 00:03:14
- Edit Distance – Writing Your Own Algorithm to Find Edit Distance Between Two Strings 00:02:49
- Processing Two Short Stories and Extracting the Common Vocabulary 00:02:38
- Chapter 4 : Regular Expressions
- Regular Expression – Learning to Use *, +, and ? 00:03:24
- Regular Expression – Learning to Use Non-Start and Non-End of Word 00:03:20
- Searching Multiple Literal Strings and Substrings Occurrences 00:01:54
- Creating Date Regex 00:02:41
- Making Abbreviations 00:01:19
- Learning to Write Your Own Regex Tokenizer 00:01:22
- Learning to Write Your Own Regex Stemmer 00:02:14
Show and hide more
TO MAC USERS: If RAR password doesn't work, use this archive program:
RAR Expander 0.8.5 Beta 4 and extract password protected files without error.
TO WIN USERS: If RAR password doesn't work, use this archive program:
Latest Winrar and extract password protected files without error.