Deep Learning: Natural Language Processing in Python with Word2Vec: Word2Vec and Word Embeddings in Python and Theano (Deep Learning and Natural Language Processing Book 1) by LazyProgrammer
English | 19 Aug 2016 | ASIN: B01KQ0ZN0A | 47 Pages | AZW3 | 206.18 KB
Word2Vec
Word2Vec is a set neural network algorithms that have gotten a lot of attention in recent years as part of the re-emergence of deep learning in AI.
The idea that one can represent words and concepts as vectors is not new. The ability to do it effectively and generate noteworthy results is.
Word2Vec algorithms are especially interesting because they allow us to perform arithmetic on the word vectors that yield both surprising and satisfying results. We call these “word analogies”.
Some popular word analogies Word2Vec is capable of finding:
“King” is to “Man” as “Queen” is to “Woman”.
“France” is to “Paris” as “Italy” is to “Rome”.
“December” is to “November” as “July” is to “June”.
Not only can we cluster similar words together, we can make all these clusters have the same “structure”, all by using Word2Vec.
Word2Vec was created by a team led by Tomas Mikolov at Google and has many advantages over earlier algorithms that attempt to do similar things, like Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI).
In this book we cover various popular flavors of the Word2Vec algorithm, including CBOW (continuous bag-of-words), skip-gram, and negative sampling.
I show you both their derivations in math (you’ll see that if you already are familiar with deep learning concepts, there is no new math to be learned), and how to implement them in code.
Whereas implementation in Numpy is just the straightforward application of the equations in code, Theano is a bit more complex because it requires new array-slicing techniques, namely running gradient descent on only a part of a matrix. It’s not straightforward, but I walk you through all the bits and pieces required to understand the full implementation.
TO MAC USERS: If RAR password doesn't work, use this archive program:
RAR Expander 0.8.5 Beta 4 and extract password protected files without error.
TO WIN USERS: If RAR password doesn't work, use this archive program:
Latest Winrar and extract password protected files without error.