Supplier eBooks

Page 8 of 37

9 INTEL 2021 Figure 4: Training loss in Langmod_nn model. output vector (logits) corresponds to the probability of that word in the vocabulary being the next word. Loss The normal cross-entropy loss between the logits and the true labels as the model's cost. Optimizer A normal SGD Optimizer with learning rate .05. Each epoch (around 480,000 examples) takes about 10 minutes to train on the CPU. The test log likelihood after epoch five is -846493.44 Memn2n-master Memn2n-master is a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of memory network, but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. The way of getting good and accurate answers using memory networks is to remember the initial information provided to us. The detailed information on how memory network works can find out in this blog. Input data This directory includes the first set of 20 tasks for testing text understanding and reasoning in the bAbI project. The motive behind these 20 tasks is that each task tests a unique aspect of text and reasoning, and hence by testing the different abilities of the trained models. To pass the input into one hot encoded vector of dimensions of 5000. Input A word in a corpus. Because the vocabulary size can get very large, we have limited the vocabulary to the top 5,000 words in the corpus, and the rest of the words are replaced with the UNK symbol. Each sentence in the corpus is also double- padded with stop symbols. Output The following word in the corpus also encoded one-hot in a vector the size of the vocabulary. Layers The model consists of the following three layers: Embedding Layer Each word corresponds to a unique embedding vector, a representation of the word in some embedding space. Here, the embedding, all have dimension 50. We find the embedding for a given word by doing a matrix multiply (essentially a table lookup) with an embedding matrix that is trained during regular backpropagation. Hidden Layer A fully-connected feed-forward layer with hidden layer size 100, and rectified linear unit (ReLU) activation. Softmax Layer A fully-connected feed-forward layer with layer size equal to the vocabulary size, where each element of the Figure 5: Test Loss in Langmod_nn model.

Articles in this issue

view archives of Supplier eBooks - Intel - Reimagining What's Next

Intel - Reimagining What's Next

Contents of this Issue

Navigation

Page 8 of 37

Articles in this issue