Making a Reverse Dictionary

Image title

Is it value it? Let me work it. I put my coaching knowledge down, flip it, and reverse it.

On this article, we’re going to see how you can use Word2Vec to create a reverse dictionary. We’re going to use Word2Vec, however the identical outcomes may be achieved utilizing any phrase embeddings mannequin. Don’t fret when you have no idea what any of this implies, we’re going to clarify it. A reverse dictionary is solely a dictionary through which you enter a definition and get again the phrase that matches that definition.

You’ll find the code within the companion repository.

Searching for a unique start line with neural networks? Learn An Introduction to Implementing Neural Networks Utilizing TensorFlow

Pure Language Processing in Observe

Pure Language Processing is a good discipline: We discovered it very fascinating and our shoppers want to make use of it of their purposes. We wrote an awesome explanatory article about it: Analyze and Perceive Textual content: Information to Pure Language Processing. Now we wish to write extra sensible ones that will help you use it in your mission. We predict it’s helpful as a result of it may be fairly onerous to get into it: You by no means know if an issue may be solved in someday, with a ready-to-use library, or you will want two years and a analysis staff to get respectable outcomes. To place it merely: The onerous half isn’t the technical facet, however understanding how you can apply it efficiently.

That’s even more true if an issue may be solved with machine studying. You want a little bit of background to grasp machine studying. And even when the answer works, you may nonetheless want weeks of tweaking weights and parameters to get it proper.

So, on this article, we’re going to simply see one know-how and a easy utility: a reverse dictionary. That’s to say, how you can get a phrase from its definition. It’s a neat utility and you can’t actually get one by conventional means. There isn’t any official reverse dictionary guide that you may purchase and you can’t code one with deterministic algorithms.

Representing Phrases in Machine Studying

Step one in a machine studying downside is knowing how you can signify the information it’s important to work with. Probably the most fundamental approach is utilizing one-hot encoding. With this methodology you:

  • gather a listing of of all attainable values (e.g., 10,000 attainable values)
  • every attainable worth is represented with a vector that has as many parts because the attainable values (e.g., every worth is represented by a vector with 10,000 parts)
  • to signify every worth, you assign Zero in all parts besides one, which could have 1 (i.e., every element is 0, aside from one that’s 1)

For instance, utilized to phrases this could imply that:

  • you might have a vocabulary of 10,000 phrases
  • every phrase is represented by a vector, and the vector has 10,000 parts
  • the phrase canine is represented by [1, 0, 0 …], the phrase cat is represented by [0, 1, 0 …], and many others.

This methodology is able to representing all phrases, but it surely has a few massive drawbacks:

  • every vector could be very sparse; most parts are at 0
  • the illustration doesn’t maintain any semantic worth; father and mom are shut in which means, however you’ll by no means see that utilizing one-hot encoding

Phrase Embeddings to the Rescue

To beat each these limitations, phrase embeddings had been invented. The important thing instinct of one of these phrase illustration is that phrases with related which means have an identical illustration. This enables having a lot denser vectors — and to seize the which means of phrases, not less than in relation to different phrases. The second assertion means merely that phrase embeddings do not likely seize what father means, however its illustration shall be just like that of mom.

It is a very highly effective function and permits all types of cool purposes. As an illustration, this implies that you may remedy issues like this one:

What phrase is to father as mom is to daughter?

The primary mannequin for phrase embedding was Word2Vec (from Google), the one which we’re going to use to construct a reverse dictionary. This mannequin revolutionized the sector and impressed many different fashions reminiscent of FastText (from Fb) or GloVe (Stanford). There are small variations between these fashions — as an illustration, GloVe and Word2Vec prepare on phrases, whereas FastText trains on character n-grams. Nevertheless the ideas, purposes, and outcomes are very related.

How Word2Vec Works

The effectiveness of Word2Vec depends on a form of trick: We’re going to prepare a neural community for one process, however then we’re going to use it for one thing else. This isn’t distinctive to Word2Vec. It is one of many widespread approaches at your disposal in machine studying. Mainly, we’re going to prepare a neural community to supply a sure output, however then we’re going to drop the output layer and simply maintain the weights of the hidden layer of the neural community.

The coaching course of works as traditional: We give the neural community each the enter in addition to the output anticipated for such an enter. This manner, the neural community can slowly discover ways to generate the proper output.

This coaching process is to calculate the likelihood {that a} sure phrase seems in context, given our enter phrase. For instance, if we have now the phrase programmer, what’s the likelihood that we’re going to see the phrase laptop near it in a phrase?

Word2Vec Coaching Methods

There are two typical methods to coach Word2Vec: CBOW and skip-gram — one is the inverse of the opposite. With Continuous Bag of Phrases (CBOW), we’re going to give, within the enter, the context of a phrase and, in output, we should always produce the phrase we care about. With skip-gram, we’re going to do the alternative: from a phrase, we’re going to predict the context through which it seems.

The time period context on the most simple degree merely means the phrases across the goal phrase, just like the phrases earlier than and after it. Nevertheless, you may additionally use context in a syntactic sense (e.g., the topic, if the goal phrase is a verb). Right here, we’re going to consider the only which means of context.

As an illustration, think about the phrase:

the mild large graciously spoke

With CBOW, we’re going to enter [gentle, graciously] to have within the output large. Whereas with skip-gram, we’re going to put in enter large and put in output [gentle, graciously]. Coaching is completed with particular person phrases, so in observe, for CBOW:

  • the primary time, we’re inputting mild and anticipating large within the output.
  • the second time, we give graciously anticipating large within the output.

For skip-gram, we might invert enter and output.

What We Get from the Neural Community

As we mentioned, on the finish of the coaching, we’ll drop the output layer as a result of we don’t actually care in regards to the probabilities {that a} phrase will seem near our enter phrase. For instance, we don’t actually care how possible it’s that USA will seem shut when we have now the phrase flag within the enter. Nevertheless, we’re going to maintain the weights of the hidden layer of our neural community and use it to signify the phrases. How does that work?

It really works due to the construction of our community. The enter of our community is the one-hot encoding of a phrase. So, as an illustration canine is represented by [1, 0, 0 …]. Throughout coaching, the output can be additionally a one-hot encoding of a phrase (e.g., utilizing CBOW, for the phrase the canine barks, we may give canine within the enter and put barks within the output). In the long run, the output layer could have a collection of chances. For instance, given the enter phrase cat, the output layer could have a likelihood that the phrase canine will seem near cat. One other likelihood that the phrase pet will seem shut, and so forth.

Neural Network for Word2Vec
Neural Community for Word2Vec

Every neuron has a weight for every phrase, so on the finish of the coaching, we could have N neurons, each with a weight for every phrase of the vocabulary. Additionally, keep in mind that the enter vector, representing a phrase, is all zeros, save for one place — the place it’s 1.

So, given the mathematical guidelines of matrix multiplication, if we multiply an enter vector with the matrix of neurons, the 0s of the enter vector will nullify most weights within the neurons and what stays shall be one weight, in every neuron, linked to the enter phrase. The collection of non-null weights in every neuron would be the weights that signify the phrase in Word2Vec.

Phrases which have related contexts could have related output, so they may have related chances to be discovered subsequent to a particular phrase. In different phrases, canine and cat will produce related outputs. Subsequently they may even have related weights. That is the explanation phrases which can be shut in which means are represented with vectors which can be additionally shut in Word2Vec.

The instinct of Word2Vec is sort of easy: related phrases will seem in related phrases. Nevertheless, additionally it is fairly efficient. Offered, in fact, that you’ve got a big sufficient dataset for the coaching. We’ll take care of the problem later.

The Which means in Word2Vec

Now that we all know how Word2Vec works, we will take a look at how this could result in a reverse dictionary. A reverse dictionary is a dictionary that finds the phrase from an enter definition. So, ideally, when you enter group of kinfolkthis system ought to provide you with household.

Let’s begin from the apparent: a pure use of phrase embeddings can be a dictionary of synonyms. That’s as a result of, similar to we mentioned, with this method, related phrases have an identical illustration. So when you ask the system to discover a phrase near the enter, it will discover a phrase with an identical which means. For instance, when you have happiness, you’ll anticipate to get pleasure.

From this, you may suppose that additionally doing the alternative is feasible, like discovering antonyms of enter phrases. Sadly, this isn’t immediately attainable as a result of the vector representing the phrases can not seize so exact a relation between phrases. Mainly, it’s not true that the vector for the phrase unhappy is within the mirror place of the one for glad.

Why it Works

Take a look at the next simplified illustration of the phrase vectors to grasp why it really works.

Relationship between words in Word2Vec
Graphical illustration of relationship between phrases

The system can discover the phrase indicated by ? just because it could add, from the vector for father, the distinction between the 2 given phrase vectors (mom and daughter). The system does seize the connection between phrase, but it surely doesn’t seize their relationship in a approach that’s simply comprehensible. Put in different approach: the place of the vector is significant, however it’s which means it’s not outlined in an absolute approach (e.g., the alternative of), solely in a relative one (e.g., a phrase that appears like A minus B).

This additionally explains why you can’t discover antonyms immediately: within the Word2Vec illustration there isn’t a mathematical operation that can be utilized to explain this relation.

How the Reverse Dictionary Works

Now that you simply perceive precisely the facility of phrase vectors, you possibly can perceive how you should utilize them to create a reverse dictionary. Mainly, we’re going to use them to discover a phrase that’s just like the mixture of enter phrases, that’s to say the definition. This can work as a result of the system makes use of vector arithmetic to seek out the phrase that’s nearer to the set of phrases given as our enter.

For instance, when you enter group of kinfolk, it ought to discover household. We’re additionally in a position to make use of negated phrases within the definition to assist establish a phrase. For instance, group of -relatives resolves to group. We’ll see the exact which means of negating a phrase later.

The Information for Our Word2Vec Mannequin

Now that the idea is all clear, we will take a look at the code and construct this factor.

Step one can be constructing the dictionary. This could not be onerous per se, however it could take a very long time to do it. Extra importantly, the extra content material we will use, the higher. And for the typical consumer, it’s not straightforward to obtain and retailer a considerable amount of knowledge. The dump of the English Wikipedia alone, when extracted, can take greater than 50 GB (solely the textual content). And the Frequent Crawl (crawled pages freely out there) knowledge can take petabytes of storage.

For sensible causes, it’s higher to make use of the pre-trained mannequin shared by Google primarily based on Google Information knowledge: GoogleNews-vectors-negative300.bin.gz. In the event you seek for this file, you will discover it in lots of locations. Beforehand, the official supply was on a Google Code mission, however now the most effective supply appears to be on Google Drive. It’s 1.6GB compressed and also you don’t have to uncompress it.

As soon as we have now downloaded the information, we will put it within the listing fashions below our mission. There are a lot of libraries for Word2Vec, so we may use many languages, however we’ll go for Python, given its recognition in machine studying. We’re going to use the Word2Vec implementation of the library gensim, given that’s the most optimized. For the net interface, we’re going to use Flask.

Loading the Information

To begin we have to load the Word2Vec in reminiscence with Three easy strains.

mannequin = KeyedVectors.load_word2vec_format("./fashions/GoogleNews-vectors-negative300.bin.gz", binary=True)
mannequin.init_sims(exchange=True) #
mannequin.syn0norm = mannequin.syn0 # stop recalc of normed vectors

The primary line is the one one actually essential to load the information. The opposite two strains are wanted to do some preliminary calculations at first, in order that there isn’t a have to do them later for every request.

Nevertheless, these Three strains can take 2-Three minutes to execute in your common laptop (sure, with a SSD). It’s a must to wait this time solely as soon as, at first, so it’s not catastrophic, however additionally it is not splendid. The choice is to make some calculations and optimizations as soon as and for all. Then we save them and, at every begin, we load them from disk.

We add this perform to create an optimized model of our knowledge.

def generate_optimized_version(): mannequin = KeyedVectors.load_word2vec_format("./fashions/GoogleNews-vectors-negative300.bin.gz", binary=True) mannequin.init_sims(exchange=True)'./fashions/GoogleNews-vectors-gensim-normed.bin')

And in the primary perform, we use this code to load the Word2Vec knowledge every time.

optimized_file = Path('./fashions/GoogleNews-vectors-gensim-normed.bin')
if optimized_file.is_file(): mannequin = KeyedVectors.load("./fashions/GoogleNews-vectors-gensim-normed.bin",mmap='r') else: generate_optimized_version() # maintain every thing prepared
mannequin.syn0norm = mannequin.syn0 # stop recalc of normed vectors

This shortens the load time from a few minutes to a couple seconds.

Cleansing the Dictionary

There’s nonetheless one factor that we have now to do: cleansing the information. On one hand, the Google Information mannequin is nice, it has been generated by a really massive dataset, so it’s fairly correct. It permits us to get a lot better outcomes than what we might get constructing our personal mannequin. Nevertheless, since it’s primarily based on information, it additionally incorporates quite a lot of misspellings and, extra importantly, a mannequin for entities that we don’t want.

In different phrases, it doesn’t comprise simply particular person phrases, but additionally stuff just like the names of buildings and establishments which can be talked about within the information. And since we wish to construct a reverse dictionary, this might hinder us. What may occur is that, for instance, if we give in enter a definition like a tragic occasion? The system may discover that essentially the most related merchandise for that group of phrases is a spot the place a tragic occasion occurred. And we don’t want that.

So, we have now to filter all of the gadgets output by our mannequin to make sure that solely generally used phrases that you may truly discover in a dictionary are proven to the consumer. Our checklist of such phrases comes from SCOWL (Spell Checker Oriented Phrase Lists). Utilizing a device within the linked web site, we created a customized dictionary and put it in our fashions folder.

# learn dictionary phrases
dict_words = []
f = open("./fashions/phrases.txt", "r")
for line in f: dict_words.append(line.strip()) f.shut() # take away copyright discover dict_words = dict_words[44:]

Now we will simply load our checklist of phrases to match the gadgets returned by our Word2Vec system.

The Reverse Dictionary

The code for the reverse dictionary performance could be very easy.

def find_words(definition, negative_definition): positive_words = determine_words(definition) negative_words = determine_words(negative_definition) similar_words = [i[0] for i in mannequin.most_similar(optimistic=positive_words, unfavorable=negative_words, topn=30)] phrases = [] for phrase in similar_words: if (phrase in dict_words): phrases.append(phrase) if (len(phrases) > 20): phrases = phrases[0:20] return phrases

From the consumer, we obtain in enter optimistic phrases that describe the wanted phrase. We additionally obtain unfavorable phrases, mathematically these are phrases whose vector have to be subtracted from the opposite phrases.

It’s tougher to know the which means of this operation: It isn’t as if we had been together with the alternative of the phrase. Moderately we’re saying that we wish to take away the which means of the unfavorable phrase. For instance, think about that the definition is group of -relatives, with kinfolk the negated phrase and group and  of the optimistic phrases. We’re saying that we wish the phrase that has the closest which means to the one recognized by the set of group and of, however from that which means, we take away any sense that’s particularly added by kinfolk.

This occurs on line 5, the place we name the tactic that discovered the highest 30 phrases most just like the which means recognized by the mixture of optimistic and which means phrases. The perform returns the phrase and the related rating, since we’re solely within the phrase itself, we ignore the rating.

The remainder of the code is simple to grasp. From the phrases returned beforehand, we choose solely those which can be actual phrases, quite than locations or occasions. We try this by evaluating the phrases from the database of dictionary phrases we loaded earlier. We additionally scale back the checklist to as much as 20 parts.

Cleansing Phrases from the Enter

Within the find_words methodology, on strains 2 and three, we name a perform decide phrases. This perform principally generates a listing of phrases from the enter string. If a phrase is prepended by the minus signal is taken into account a unfavorable phrase.

def determine_words(definition): possible_words = definition.break up() for i in vary(len(possible_words) - 1, -1, -1): if possible_words[i] not in mannequin.vocab: del possible_words[i] possible_expressions = [] for w in [possible_words[i:i+3] for i in vary(len(possible_words)-3+1)]: possible_expressions.append('_'.be a part of(w)) ex_to_remove = [] for i in vary(len(possible_expressions)): if possible_expressions[i] in mannequin.vocab: ex_to_remove.append(i) words_to_remove = [] for i in ex_to_remove: words_to_remove += [i, i+1, i+2] words_to_remove = sorted(set(words_to_remove)) phrases = [possible_expressions[i] for i in ex_to_remove] for i in vary(len(possible_words)): if i not in words_to_remove: phrases.append(possible_words[i]) return phrases

This perform begins by producing a listing of phrases, however then it additionally provides expressions. That’s as a result of the Google Information mannequin additionally has vector representations of phrases along with that of easy phrases. We try to seek out expressions just by placing collectively 3-gram phrases (i.e., a sliding set of three phrases). If an expression is discovered within the Google Information mannequin (line 14) we have now so as to add the expression as a complete and take away the person phrases.

The Net App

For ease of use, we create a easy Flask app: a fundamental net interface to let the consumer present a definition and browse the checklist of phrases.

def create_app(): app = Flask(__name__) return app
app = create_app()
@app.route('/', strategies=['GET', 'POST'])
def index(): if request.methodology == 'POST': phrases = request.kind['definition'].break up() negative_words = '' positive_words = '' for i in vary(len(phrases)): if(phrases[i][0] == '-'): negative_words += phrases[i][1:] + ' ' else: positive_words += phrases[i] + ' ' return jsonify(find_words(positive_words, negative_words)) else: return render_template('index.html')

The app solutions to the basis route: it reveals the web page with the shape to offer the definition, and returns the checklist of corresponding phrases when it receives a definition. Because of Flask, we will create it in just a few strains.

Reverse dictionary web interface

The reverse dictionary in motion.

How properly does it work? Effectively, this method works fairly properly for descriptive definitions, i.e., if we use a definition that describes the phrase we’re on the lookout for. By that, we imply that you’ll in all probability discover the proper phrase within the checklist returned by the app. It won’t be the highest phrase, but it surely ought to be there. It doesn’t work that properly for logical definitions, e.g., feminine partner. So, it’s actually not an ideal answer, but it surely works fairly properly for one thing so easy.


On this article, we created an efficient reverse dictionary in just a few strains due to the facility of Word2Vec and ready-to-use libraries. A fairly good outcome for a small tutorial. However you must bear in mind one thing in terms of machine studying: the success of your app doesn’t rely in your code alone. As for a lot of machine studying purposes, the success relies upon drastically on the information you might have. Each within the coaching knowledge you utilize and the way you feed knowledge to this system.

As an illustration, we unnoticed just a few trials and errors for the selection of the dataset. We already anticipated to get some issues with the Google Information dataset as a result of we knew it contained information occasions. So we tried to construct our personal dataset primarily based on the English Wikipedia. And we failed fairly miserably: The outcomes had been worse than those we bought utilizing the Google Information dataset. A part of the problem was that the information was smaller, however the true downside was that we’re much less competent than the unique Google engineers. Choosing the proper parameters for coaching requires a form of black magic that you may be taught solely with quite a lot of expertise. In the long run, the most effective answer for us was utilizing the Google Information dataset and filter the output to get rid of undesirable outcomes for our utility. A bit humbling, however one thing helpful to recollect.

Additional Studying

Going Past Solely Utilizing Word2vec for Phrases

GloVe and fastText — Two Well-liked Phrase Vector Fashions in NLP

0 Comment

Leave a comment