Remove Usernames & HTTP Links From Tweet Data
Python
Here we have tweet data in a dataframe column. We use declare a function that uses regex to remove any words the start with '@' (usernames) or 'http' (links). We then use Pandas apply to pass each tweet in the dataframe to the function to process the data.
1| import re 2| 3| def remove_usernames_links(tweet): 4| tweet = re.sub('@[^\s]+','',tweet) 5| tweet = re.sub('http[^\s]+','',tweet) 6| return tweet 7| df['tweet'] = df['tweet'].apply(remove_usernames_links)
142
127
122
115