Create Bag of Words DataFrame Using Count Vectorizer

Python

Transforms a dataframe text column into a new "bag of words" dataframe using the sklearn count vectorizer. First the count vectorizer is initialised before being used to transform the "text" column from the dataframe "df" to create the initial bag of words. This output from the count vectorizer is then converted to a dataframe by converting the output to an array and then passing this Into the parameters of the dataframe along with feature names from the count vectorizer which gives us the column names.

 1|  from sklearn.feature_extraction.text import CountVectorizer
 2|  count_vectorizer = CountVectorizer()
 3|  bag_of_words = count_vectorizer.fit_transform(df['text'])
 4|  bag_of_words = pd.DataFrame(bag_of_words.toarray(),
 5|  				columns = count_vectorizer.get_feature_names())
Did you find this snippet useful?

Sign up for free to to add this to your code library