Never Forget Another Line of Code

Datasnips is a free code snippet hosting platform for Data Science & AI. It enables your code snippets to be organized, searchable & shareable.

PUBLIC SNIPPETS

LATEST SNIPPETS

TOP SNIPPETS

POPULAR TAGS

How to Return the Most Frequent Bigrams from Text Using NLTK

Python

In this snippet we return one bigram that appears at least twice in the string variable text.

 1|  import nltk
 2|  from nltk.collocations import *
 3|  bigram_assoc_measures = nltk.collocations.BigramAssocMeasures()
 4|  
 5|  text = 'One Two One Two Three Four Five Six'
 6|  
 7|  #1. Split text into words
 8|  text = text.split()
 9|  
10|  #2. Set minimum number of bigrams to extract and 
11|  #of those how many to return
12|  minimum_number_of_bigrams = 2
13|  top_bigrams_to_return = 1
14|  
15|  #3. Get bigrams contained in text variable
16|  finder = BigramCollocationFinder.from_words(text)
17|  
18|  #4. Filter bigrams to those that appear at least twice
19|  finder.apply_freq_filter(minimum_number_of_bigrams) 
20|  
21|  #5. Return one of the top bigrams
22|  finder.nbest(bigram_assoc_measures.pmi, bigrams_to_return)