Phrase2vec in Practice - Aerin Kim

13
Phrase2vec in Prac-ce Aerin Kim

Transcript of Phrase2vec in Practice - Aerin Kim

Page 1: Phrase2vec in Practice - Aerin Kim

Phrase2vecinPrac-ce

AerinKim

Page 2: Phrase2vec in Practice - Aerin Kim

Attheendofthistalk

Page 3: Phrase2vec in Practice - Aerin Kim

Download&Install

•  h;ps://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit

•  easy_install-Ugensim•  easy_installnumpy•  easy_installscipy•  Pipinstallnltk

Page 4: Phrase2vec in Practice - Aerin Kim

Frustra-on

•  BinaryVector(discreterepresenta-on)

•  Apple=[0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]AND•  Fruit=[0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0]

=0•  Dimensionality?–verysparserepresenta-on

Page 5: Phrase2vec in Practice - Aerin Kim

Solu-on

•  Sta-s-calNLPèCo-occurrencematrix'buildings' 'factories' 'has' 'owns' 'Donald' 'Trump' 'many' 'hands' 'in' 'Mr.' '.' 'China' 'small' 'rela<vely'

'buildings' 0 0 0 0 0 0 1 0 0 0 1 0 0 0

'factories' 0 0 0 0 0 0 1 0 1 0 0 0 0 0

'has' 0 0 0 0 0 2 1 0 0 0 0 0 0 1

'owns' 0 0 0 0 0 1 1 0 0 0 0 0 0 0

'Donald' 0 0 0 0 0 2 0 0 0 0 0 0 0 0

'Trump' 0 0 2 1 2 0 0 0 0 1 0 0 0 0

'many' 1 1 1 1 0 0 0 0 0 0 0 0 0 0

'hands' 0 0 0 0 0 0 0 0 0 0 1 0 1 0

'in' 0 1 0 0 0 0 0 0 0 0 0 1 0 0

'Mr.' 0 0 0 0 0 1 0 0 0 0 0 0 0 0

'.' 1 0 0 0 0 0 0 1 0 0 0 1 0 0

'China' 0 0 0 0 0 0 0 0 1 0 1 0 0 0

'small' 0 0 0 0 0 0 0 1 0 0 0 0 0 1

'rela<vely' 0 0 1 0 0 0 0 0 0 0 0 0 1 0

Page 6: Phrase2vec in Practice - Aerin Kim

Be;erSolu-on

•  Predictsurroundingwordsofeveryword

•  YoushallbejudgedbythecompanyyoukeepSinceheannouncedhiscandidacyforthepresidency,TrumphasfiledanumberoflawsuitsWouldaTrumppresidencyundotheUNclimatechangeagreement?

Thesewordswillrepresent“presidency”

Page 7: Phrase2vec in Practice - Aerin Kim

WordEmbedding(Word2Vec)

•  Objec-vefunc-on:

MAXIMIZEthelogprobabilityofanycontextword

giventhecurrentcenterword.

Page 8: Phrase2vec in Practice - Aerin Kim

One(very)BigVectorΘ

•  ΘisthesetofALLparametersinonevector

Page 9: Phrase2vec in Practice - Aerin Kim

OBJfunc-onofsinglewindow

•  TrumpannouncedhiscandidacyforpresidentasaRepublicanUVU

•  Assumingwindowsize=1•  Firstelement:Exp(UT(his)⋅V(candidacy))•  Secondelement:Exp(UT(for)⋅V(candidacy))

Page 10: Phrase2vec in Practice - Aerin Kim

ResultalertheOp-miza-on

•  Seman-callyFamousexample:Vec(King)–Vec(man)=Vec(Queen)–Vec(woman)Vec(CSS)–Vec(Front-end)=Vec(Django)–Vec(Beck-end)

•  Syntac-callyVector(apple)-vector(apples)=vector(car)-vector(cars)Vec(built)–Vec(build)=Vec(developed)–Vec(develop)

Page 11: Phrase2vec in Practice - Aerin Kim

ClassifyingDataScienceKeywords

Page 12: Phrase2vec in Practice - Aerin Kim

Moresophis-catedrela-onships

Page 13: Phrase2vec in Practice - Aerin Kim

Let’smakethePhraseVectors!

•  h;ps://bitbucket.org/yunazzang/aiwiththebest_byor