Create Bag of Words DataFrame Using Count Vectorizer
Python
Transforms a dataframe text column into a new "bag of words" dataframe using the sklearn count vectorizer. First the count vectorizer is initialised before being used to transform the "text" column from the dataframe "df" to create the initial bag of words. This output from the count vectorizer is then converted to a dataframe by converting the output to an array and then passing this Into the parameters of the dataframe along with feature names from the count vectorizer which gives us the column names.
1| from sklearn.feature_extraction.text import CountVectorizer 2| count_vectorizer = CountVectorizer() 3| bag_of_words = count_vectorizer.fit_transform(df['text']) 4| bag_of_words = pd.DataFrame(bag_of_words.toarray(), 5| columns = count_vectorizer.get_feature_names())
149
132
127
119