Blog

How to Use Python and Matplotlib to Create Data Visualizations: Part 1 - Line Charts

In this tutorial series, we will look at how to get started with using Python and Matplotlib to visualise our data. In this first part we’ll take you through the basics of how to create a simple line chart, how to customise and format it while working with instances of Matplotlib’s figure and axes objects. In future parts we will focus on plotting different chart types and then concentrate on some intermediate methods such as adding secondary axes and creating subplots.

Before we begin, let’s quickly look at the data we’ll be using for the first part of this tutorial. We’ll be using a simple dataframe called sales_by_week that gives us net sales values and profit by week for an unknown company. We’ll be using Matplotlib to visualise this this time series data.

SETUP

First let’s import Pyplot from Matplotlib and also activate Matplotlib inline as we’re using a Jupyter notebook. This allows any visualisations we create to render in our notebook.

import matplotlib.pyplot as plt
%matplotlib inline

 

FIGURES & AXES

Next we are going to create a figure instance and an axes instances to build our plot. When we create an axes instance we are essentially creating an empty chart that we will plot our data on. The axes instance that we create will come with predefined functions that we can use to define what data we want to plot and how we want it plotted. Axes instances must be contained within a figure instance.

The figure instance acts as a container for axes instances and can be split up into a grid to accommodate multiple axes instances. We can also resize the figure instance and the axes instance(s) inside the figure will be resized accordingly to fit inside.

We are going to create our figure and axes instances using the subplot function. We pass into the subplots function the number of rows and columns we want in our figures grid, in other words how many charts we are going to create and how we would like them arranged. We’ll also pass the fig size which is just the width and height of our figure in inches. We do this in the following way:

plt.subplots(rows, columns, figsize=(width,height))

As is standard, we are going to call our figure instance fig and our axes instance ax. As we are currently only creating one axes instance we will set the grid to be one column and one row while the we’ll make the figure size 12 wide and 7 high.

fig, ax = plt.subplots(1,1,figsize=(12,7)) 
plt.show()

The plt.show() command essentially is the trigger for our plot to render.

(Note: Not passing any parameters for rows and columns here would result in the function using 1 and 1 as the default anyway)

 

PLOTTING

Now it’s time to actually add some data to our plot. We have our sales_by_week dataframe that we saw earlier already prepared so we are going to use the plot function of our ax instance and pass in the week and Net_Sale_Value columns to the x-axis and y-axis parameters respectively.

fig, ax = plt.subplots(figsize=(12,7)) 
ax.plot(sales_by_week['Week'], sales_by_week['Net_Sale_Value'])
plt.show()

We’ve created out first plot which shows how the sales for this company changed over time during the year in question. Let’s see whether profit also followed the same trend as sales by adding a second plot on to our axes.

fig, ax = plt.subplots(figsize=(12,7)) 
ax.plot(sales_by_week['Week'], sales_by_week['Net_Sale_Value'])
ax.plot(sales_by_week['Week'], sales_by_week['Profit'])
plt.show()

We can see from the orange line for profit that often profit runs inline with sales but occasionally we have a sales spike that is not followed by an equivalent profit spike.

Now the plot we’ve created isn’t very clear; there’s no indication of what each line or axis represents. We can fix this by using labels.

 

LABELLING

To start with let’s add a legend to our plot. We do this by passing a label argument to each of our plot functions and then also calling the legend function before we render.

fig, ax = plt.subplots(figsize=(12,7)) 
ax.plot(sales_by_week['Week'], sales_by_week['Net_Sale_Value'], label='Net Sales') 
ax.plot(sales_by_week['Week'], sales_by_week['Profit'], label='Profit')
ax.legend()
plt.show()

Let’s also add a title for our plot and label the x and y axes. To do this we use the set_title, set_xlabel and set_ylabel functions respectively and pass to them the strings we want to use as labels.

fig, ax = plt.subplots(figsize=(12,7)) 
ax.plot(sales_by_week['Week'], sales_by_week['Net_Sale_Value'], label='Net Sales')
ax.plot(sales_by_week['Week'], sales_by_week['Profit'], label='Profit')
ax.legend()
ax.set_title('Net Sales & Profit By Week')
ax.set_xlabel('Week')
ax.set_ylabel('$')
plt.show()

Now we have a much more readable chart, let’s move onto formatting our visualisation.

 

FORMATTING

Matplotlib has ways for us to change most parts of our plots including the data lines, the plot space and the axes. On our plot we are going to change the formatting of the lines by changing the colour, line type and thickness while also adding some markers for readability purposes. We do this by passing arguments for color, linestyle, linewidth and marker into the plot function.

fig, ax = plt.subplots(figsize=(12,7)) 

ax.plot(sales_by_week['Week'], sales_by_week['Net_Sale_Value'],label='Net Sales',linestyle='-', marker='o',color='#4285F4',linewidth=3)

ax.plot(sales_by_week['Week'], sales_by_week['Profit'], label='Profit',linestyle='--', color='#DB4437',linewidth=3)

ax.set_title('Net Sales & Profit By Week')
ax.set_xlabel('Week')
ax.set_ylabel('$')
ax.legend()
plt.show()

As you can see, we’ve changed our sales line to be a thicker blue line with circular markers at the datapoints while our profit line is now a thicker red dashed line.

(For more info on colours and markers refer to the Matplotlib documentation:

https://matplotlib.org/3.1.1/tutorials/colors/colors.html

https://matplotlib.org/3.1.1/api/markers_api.html)

We can also customise the plot area itself. Let’s do this now by adding a grid to make our chart more readable by calling the grid function and passing arguments for the grid colour and line width.

fig, ax = plt.subplots(figsize=(12,7)) 

ax.plot(sales_by_week['Week'], sales_by_week['Net_Sale_Value'], label='Net Sales',
        linestyle='-', marker='o', color='#4285F4',linewidth=3)

ax.plot(sales_by_week['Week'], sales_by_week['Profit'], label='Profit',
       linestyle='--', color='#DB4437',linewidth=3)

ax.set_title('Net Sales & Profit By Week')
ax.set_xlabel('Week')
ax.set_ylabel('$')
ax.legend()
ax.grid(linewidth='0.3', color='gray')
plt.show()

Now we have a simple but readable line chart. In part 2 we’ll look at how to create different chart types.