Oreilly - Natural Language Processing (NLP)
by Bruno Goncalves | Released October 2018 | ISBN: 0135258847
2+ Hours of Video InstructionOverviewNatural Language Processing LiveLessons covers the fundamentals of natural language processing (NLP). It introduces you to the basic concepts, ideas, and algorithms necessary to develop your own NLP applications in a step-by-step and intuitive fashion. The lessons follow a gradual progression, from the more specific to the more abstract, taking you from the very basics to some of the most recent and sophisticated algorithms.About the InstructorBruno Goncalves is currently a Senior Data Scientist working at the intersection of Data Science and Finance. Previously, he was a Data Science fellow at NYU's Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Universite. Since completing his PhD in the Physics of Complex Systems in 2008 he has been pursuing the use of Data Science and Machine Learning to study Human Behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of Computational Linguistics, Information Diffusion, Behavioral Change and Epidemic Spreading. In 2015 he was awarded the Complex Systems Society's 2015 Junior Scientific Award for “outstanding contributions in Complex Systems Science” and in 2018 is was named a Science Fellow of the Institute for Scientific Interchange in Turin, Italy.Skill LevelIntermediateLearn How ToRepresent textModel topicsConduct sentiment analysisUnderstand word2vec word embeddingsDefine GloVeApply language detectionWho Should Take This CourseData scientists with an interest in natural language processingCourse RequirementsBasic algebraCalculus and statisticsProgramming experienceLesson DescriptionsLesson 1: Text Representations The first step in any NLP application is to establish the representations of text and numbers. One-hot encodings provide us with a sparse approach to representing words and n-grams, while bag-of-words improves memory efficiency even further. Naturally, not all words are meaningful, so the next steps are to remove meaningless stop words and to identify the most relevant words for our application using term frequency/inverse document frequency (TF/IDF). Finally, the lesson covers how to identify the stems of words so you can meaningfully reduce the size of your vocabulary. Lesson 2: Topic Modeling Lesson 2 builds on the text representations of Lesson 1 to develop ways of identifying the main subject or subjects of a text. Bruno starts by defining topics and how they can be identified. Next, you learn how to perform explicit semantic analysis to find documents mentioning a specific topic and how to cluster documents according to topics. Latent semantic analysis provides yet another powerful way to extract meaning from raw text, while non-negative matrix factorization enables you to identify latent dimensions in the text, perform recommendations, and measure similarities. Lesson 3: Sentiment Analysis After covering how to represent text in a meaningful way and identifying the topics covered in a document, we now focus on how to extract sentiment information. In other words, what kind of sentiments are being expressed? Are the words used positive or negative? The next step is to consider corpus-based approaches to defining the valence of each word and, finally, how to handle negations and modifiers. Lesson 4: Applications The first three lessons covered the fundamental tools of NLP, and now you are ready to consider specific applications and advanced topics. Perhaps one of the most important developments in NLP in recent years is the popularization of word embeddings in general and word2vec in particular. This enables you to delve deeper into vector representations of words and concepts, and to understand how semantic relations can be expressed through vector algebra. GloVe is the main competitor to word2vec, and this lesson also explores its advantages and disadvantages. As the final application of NLP and the last section in our course, we consider the question of language detection.About Pearson Video TrainingPearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Prentice Hall, Sams, and Que Topics include: IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video. Show and hide more
- Introduction
- Natural Language Processing: Introduction 00:01:04
- Lesson 1: Text Representations
- Learning objectives 00:00:40
- 1.1 Represent words and numbers 00:06:19
- 1.2 Use one-hot encoding 00:02:53
- 1.3 Implement bag-of-words 00:05:28
- 1.4 Remove stop words 00:05:16
- 1.5 Understand TF/IDF 00:08:36
- 1.6 Understand stemming 00:13:14
- Lesson 2: Topic Modeling
- Learning objectives 00:00:43
- 2.1 Find topics in documents 00:07:03
- 2.2 Perform explicit semantic analysis 00:05:11
- 2.3 Understand document clustering 00:02:50
- 2.4 Implement latent semantic analysis 00:07:28
- 2.5 Understand non-negative matrix factorization 00:07:49
- Lesson 3: Sentiment Analysis
- Learning objectives 00:00:31
- 3.1 Quantify words and feelings 00:05:50
- 3.2 Use negations and modifiers 00:12:08
- 3.3 Use corpus-based approaches 00:05:09
- Lesson 4: Applications
- Learning objectives 00:00:47
- 4.1 Understand word2vec word embeddings 00:18:03
- 4.2 Define GloVe 00:08:36
- 4.3 Apply language detection 00:14:03
- Summary
- Natural Language Processing: Summary 00:01:19
Show and hide more