Reading CSV Files from Amazon S3 into Pandas Dataframes

Python

In this example we first set our AWS credentials and region, as well as the S3 bucket and file path for the CSV file we want to read. We then create a session and S3 client using the boto3 library.

Next we use the S3 client to retrieve the CSV file from the specified bucket and file path. We read the data from the S3 object into a string and then use StringIO to create a file-like object from the string. Finally we use pandas to read the CSV data from the file-like object into a dataframe.

 1|  import pandas as pd
 2|  import boto3
 3|  from io import StringIO
 4|  
 5|  # Set your AWS credentials and region
 6|  aws_access_key_id = 'access_key_id'
 7|  aws_secret_access_key = 'secret_access_key'
 8|  aws_region = 'aws_region'
 9|  
10|  # Set the S3 bucket and file path
11|  s3_bucket = 's3_bucket_name'
12|  s3_file_path = 'data.csv'
13|  
14|  # Create a session with boto3
15|  session = boto3.Session(
16|      aws_access_key_id=aws_access_key_id,
17|      aws_secret_access_key=aws_secret_access_key,
18|      region_name=aws_region
19|  )
20|  
21|  # Create an S3 client
22|  s3 = session.client('s3')
23|  
24|  # Read the CSV file from S3 into a pandas dataframe
25|  s3_object = s3.get_object(Bucket=s3_bucket, Key=s3_file_path)
26|  s3_data = s3_object['Body'].read().decode('utf-8')
27|  df = pd.read_csv(StringIO(s3_data))
28|  
29|  # Print the dataframe
30|  print(df.head())
Did you find this snippet useful?

Sign up for free to to add this to your code library