Reading CSV Files from Amazon S3 into Pandas Dataframes
Python
In this example we first set our AWS credentials and region, as well as the S3 bucket and file path for the CSV file we want to read. We then create a session and S3 client using the boto3 library.
Next we use the S3 client to retrieve the CSV file from the specified bucket and file path. We read the data from the S3 object into a string and then use StringIO to create a file-like object from the string. Finally we use pandas to read the CSV data from the file-like object into a dataframe.
1| import pandas as pd 2| import boto3 3| from io import StringIO 4| 5| # Set your AWS credentials and region 6| aws_access_key_id = 'access_key_id' 7| aws_secret_access_key = 'secret_access_key' 8| aws_region = 'aws_region' 9| 10| # Set the S3 bucket and file path 11| s3_bucket = 's3_bucket_name' 12| s3_file_path = 'data.csv' 13| 14| # Create a session with boto3 15| session = boto3.Session( 16| aws_access_key_id=aws_access_key_id, 17| aws_secret_access_key=aws_secret_access_key, 18| region_name=aws_region 19| ) 20| 21| # Create an S3 client 22| s3 = session.client('s3') 23| 24| # Read the CSV file from S3 into a pandas dataframe 25| s3_object = s3.get_object(Bucket=s3_bucket, Key=s3_file_path) 26| s3_data = s3_object['Body'].read().decode('utf-8') 27| df = pd.read_csv(StringIO(s3_data)) 28| 29| # Print the dataframe 30| print(df.head())
149
132
127
119