Scrape Reddit Posts Using PMAW & Python

Python

In this snippet the PMAW library is used to scrape 60,000 posts from the technology subreddit between 1st October 2019 and 1st October 2021. The results are used to create a dataframe which is then output to a csv file.

 1|  import pandas as pd
 2|  from pmaw import PushshiftAPI
 3|  import datetime as dt
 4|  import os.path as path
 5|  
 6|  api = PushshiftAPI()
 7|  
 8|  before = int(dt.datetime(2021,10,1,0,0).timestamp())
 9|  after = int(dt.datetime(2019,10,1,0,0).timestamp())
10|  
11|  subreddit="technology"
12|  limit=60000
13|  posts = api.search_submissions(subreddit=subreddit, limit=limit, before=before, after=after)
14|  posts_df = pd.DataFrame(posts)
15|  filepath = path.abspath(path.join(__file__ ,'../..','data/raw/technology_posts.csv'))
16|  posts_df.to_csv(filepath,index=False)
Did you find this snippet useful?

Sign up for free to to add this to your code library