Data Preprocessing with Pandas
A collection of data preprocessing code snippets to prepare tabular data using Pandas for analysis, machine learning or visualisation
IMPORT DATA
CSV
- Selecting the Row that Contains The Column Header
- Reading a Subset of Columns
- Skipping Rows
- Reading a Sample of Rows
- Parsing a Date Column
EXCEL
- Import an Excel Sheet
- Select the Header Row
- Selecting Columns to Import
- Skipping Rows
- Selecting the Number of Rows
- Import Columns from Excel as Dates
SCRAPE DATA
- Initialise a Web Scraper with Beautiful Soup
- Find All Instances of a Tag with Beautiful Soup
- Find the Nth Tag On a WebPage with Beautiful Soup
- Get An Inner Tag Using Beautiful Soup
- Get The Text Within a Tag Using Beautiful Soup
- Using Selectors in Beautiful Soup to Scrape Using Classes & IDs
PREPARE DATA
CONVERTING
CLEANING NANS
CLEANING STRINGS
SELECTING & RENAMING
- Selecting Columns Using Loc and iLoc
- Dropping Columns
- Renaming Columns Using Dictionaries & Lists
- Selecting Rows Using iLoc
SAMPLING
FILTERING
- Filtering a Dataframe Using AND
- Filtering a Dataframe Using OR
- Drop Duplicate Values
- Filter a DataFrame Using a List
TRANSFORM
SUMMARISE
JOIN
MERGE
UNION
FEATURE ENGINEERING
WINDOW & LAG FEATURES
STRINGS
DATES
- Extract Date & Time Features
- Weekend Flag
- Days Between Dates
- Finding the Dates a Number of Days Before & After
STATISTICS & FORMULAS