Mikolov et al., 2013 (CBOW & Skip Gram)

March 31, 2022

 

Efficient estimation of word representations in vector space (Mikolov et al., 2013)

Introduction

Traditional n-gram model

Distributed representations of words as (continuous) vectors (vs. one-hot encoding)

e.g., queen → | 0.313 | 0.123 | 0.326 | 0.128 | 0.610 | 0.415 | 0.120 |

learning cost(?): the vocabulary size V * the number of parameters m (here 7)

Model architectures

Some terms

Non-linear NN models

The number of output layers can be reduced to log2|V| (base 2) using hierarchical softmax layers

Results

        Task description

            i. a comprehensive test set that contains 5 types of semantic questions and 9 types of
            syntactic questions (overall, 8869 semantic and 10675 syntactic questions)

Categories: Paper Review, NLP

Original post: https://cheonkamjeong.blogspot.com/2022/03/paper-review-mikolov-et-al-2013-cbow.html