How to Return the Most Frequent Bigrams from Text Using NLTK
Python
In this snippet we return one bigram that appears at least twice in the string variable text.
1| import nltk 2| from nltk.collocations import * 3| bigram_assoc_measures = nltk.collocations.BigramAssocMeasures() 4| 5| text = 'One Two One Two Three Four Five Six' 6| 7| #1. Split text into words 8| text = text.split() 9| 10| #2. Set minimum number of bigrams to extract and 11| #of those how many to return 12| minimum_number_of_bigrams = 2 13| top_bigrams_to_return = 1 14| 15| #3. Get bigrams contained in text variable 16| finder = BigramCollocationFinder.from_words(text) 17| 18| #4. Filter bigrams to those that appear at least twice 19| finder.apply_freq_filter(minimum_number_of_bigrams) 20| 21| #5. Return one of the top bigrams 22| finder.nbest(bigram_assoc_measures.pmi, bigrams_to_return)
142
127
122
115