Supplier eBooks

Page 7 of 37

8 REIMAGINING WHAT'S NEXT Figure 3: Source: Deep Learning Research Review Week 3: Natural Language Processing - Matrix Initialization. Word Vectors Words need to be represented as input to the machine- learning models; one mathematical way to do this is to use vectors. The English language contains an estimated 13 million words, but many of these are related. Search for an N-dimensional vector space (where N << 13 million) that is sufficient to encode all semantics in our language. To do this, there needs to be an understanding of the similarity and difference between words. The concept of vectors and distances between them (Cosine, Euclidean, etc.) can be exploited to find similarities and differences between words. Figure 2: Source: Deep Learning Research Review Week 3: Natural Language Processing - Word Vectors How Do We Represent Meaning of Words? If separate vectors are used for all of the 13 million-plus words in the English vocabulary, several problems can occur. First, there will be large vectors with a lot of "zeroes" and one "one" (in different position representing a different word). This is also known as one-hot encoding. Second, when searching for phrases such as "hotels in New Jersey" in Google, expectations are that results pertaining to "motel", "lodging", "accommodation" in New Jersey are returned. And if using one-hot encoding, these words have no natural notion of similarity. Ideally, dot products (because we are dealing with vectors) of synonym/similar words would be close to one of the expected results. Word2vec is a group of models that helps derive relations between a word and its contextual words. Beginning with a small, random initialization of word vectors, the predictive model learns the vectors by minimizing the loss function. In Word2vec, this happens with a feed-forward neural network and optimization techniques such as the SGD algorithm. Count-based models also make a co-occurrence count matrix of words in the corpus; with a large matrix with a row for each of the "words" and columns for the "context". The number of "contexts" is, of course, large, because it is essentially combinatorial in size. To overcome the size issue, singular value decomposition can be applied to the matrix, reducing the dimensions of the matrix and retaining maximum information. Software and Hardware The programming language being used is Python 3.5.2 with Intel® Optimization for TensorFlow as the framework. For training and computation purposes, the Intel ® AI DevCloud powered by Intel ® Xeon ® Scalable processors were used. Intel AI DevCloud can provide a great performance bump from the host CPU for the right application and use case because of having 50-plus cores and its own memory, interconnect, and operating system. Training Models for NLP: Langmod_nn and Memn2n-master Langmod_nn Model The Langmod_nn model6 builds a three-layer Forward Bigram Model neural network consisting of an embedding layer, a hidden layer, and a final softmax layer where the goal is to use a given word in a corpus to attempt to predict the next word.

Articles in this issue

view archives of Supplier eBooks - Intel - Reimagining What's Next

Intel - Reimagining What's Next

Contents of this Issue

Navigation

Page 7 of 37

Articles in this issue