Never Forget Another Line of Code

Datasnips is a free code snippet hosting platform for Data Science & AI. It enables your code snippets to be organized, searchable & shareable.

PUBLIC SNIPPETS

LATEST SNIPPETS

TOP SNIPPETS

POPULAR TAGS

Scrape Reddit Posts Using PMAW & Python

Python

In this snippet the PMAW library is used to scrape 60,000 posts from the technology subreddit between 1st October 2019 and 1st October 2021. The results are used to create a dataframe which is then output to a csv file.

 1|  import pandas as pd
 2|  from pmaw import PushshiftAPI
 3|  import datetime as dt
 4|  import os.path as path
 5|  
 6|  api = PushshiftAPI()
 7|  
 8|  before = int(dt.datetime(2021,10,1,0,0).timestamp())
 9|  after = int(dt.datetime(2019,10,1,0,0).timestamp())
10|  
11|  subreddit="technology"
12|  limit=60000
13|  posts = api.search_submissions(subreddit=subreddit, limit=limit, before=before, after=after)
14|  posts_df = pd.DataFrame(posts)
15|  filepath = path.abspath(path.join(__file__ ,'../..','data/raw/technology_posts.csv'))
16|  posts_df.to_csv(filepath,index=False)