ELMo – John's Blog

I have discovered a fascinating new (to me) approach to this language problem, adorably named “ELMo” by its authors. There’s another NLP program called “BERT”, so I guess they are on a Sesame Street tear.

Anyway, what I really like about ELMo is that you can encode any sentence into a vector of the same length, which makes it perfect for input into a neural network. ELMo actually generates a vector for every single word, but they are engineered in such a way (I think) that you can average the vectors of a sentence together to get a sort of “sentiment vector” for the whole sentence. That last bit could be total garbage, but the guy in the tutorial I found did it, so I’m running with it for now.

There’s some serious magic going on in these vectors that ELMo generates, and I won’t pretend to understand it quite yet. But they are big – the ones I used are 1024 in size, so that’s quite a bit of room to put features. It is easy to imagine that a neural network could pick up on the features you’re interested in, and ignore the rest. And we can let go of all our worries about sentence length and parts of speech and so on, because magic!

My code needs a little work still. The response time is very slow compared to the LSTM, on the order of a minute. That is not ELMo’s fault, I am completely doing a lot of work on every pass that I should just be loading from a file.

Also, the classifier used in the tutorial seems to spit out 1’s and 0’s instead of a floating point number, which is kind of boring. The solution may be to feed the ELMo vectors into the LSTM classifier that I used in the first pass. The guy in the tutorial did say that this was a basic implementation, obviously he didn’t want to ruin his contest by making it completely awesome. So I think there is room for improvement!

Leave a Reply Cancel reply