Data Visualization Using Matplotlib

Numerical data can be comprehended much more easily via data visualization than through a simple table of numbers.

Nov 19, 2022

∙ Paid

Charting libraries are used primarily to obtain instant insights into data and identify patterns, trends, and outliers.

A stock price chart is the first step toward determining which algorithmic trading strategy fits the stock — some strategies are more suitable for trending stocks, others for mean-reversion stocks, etc. The quality of a chart cannot be substituted for numerical statistics.

In this article, we will learn about Matplotlib, a Python visualization library that extends NumPy’s capabilities to create static, animated, and interactive visualizations. Matplotlib can be used directly to chart DataFrames in the pandas library.

Topics covered in this article include:

Figure and subplot creation.
Adding markers, colors, and styles to plots.
Labeling, ticking, and legending axes.
Annotating data points to enhance their value.
Creating and saving plots.
Matplotlib charting pandas DataFrames.

Figures and subplots creation

The Matplotlib drawing canvas supports plotting multiple charts (subplots) on a single figure.

Subplot definitions

Matplotlib.pyplot.figure objects can be created using the following method:

The result is a figure object with no axes (0 Axes):

<Figure size 2400x1200 with 0 Axes>

We need to create space for subplots on this figure before we plot anything. We can do this by specifying the size and position of the subplot using matplotlib.pyplot.figure.add_subplot(…) method.

On the left, a 1x2 subplot is added. On the top right, a 2x2 subplot is added. And finally, a 2x2 subplot is added on the bottom right:

We just added the following subplots to the figure object:

Figure 1. Three subplots are empty in the figure

We can now add visualizations to the charts once we have created the plots (“plots”/”subplots”). As physical space on the page is very expensive in all reports, it is best to create charts like the one above.

Subplots in plots

Numpy.linspace(…) generates evenly spaced values on the x axis, followed by numpy.square(…), numpy.sin(…), and numpy.cos(…) on the y axis.

In order to plot these functions, we will use the variables ax1, ax2, and ax3 we obtained from adding subplots:

We now plot the values in the following figure:

Figure 2. There are three subplots in this figure, each plotting the sine, cosine, and square functions

In order to specify the same x axis for all subplots, the sharex= parameter can be used when creating subplots.

Here are a few examples to demonstrate this functionality: plot the square, then raise x to its power of ten and plot it with the same x-axis using numpy.power(…):

Using a common x-axis and plotting different functions on each graph, we get the following diagram:

Figure 3. The figure shows a square, raised to 10 functions, and subplots sharing an x axis

As of yet, the charts generated do not have self-explanatory units, nor do they indicate what each chart represents. The charts should be enhanced with colors, markers, and line styles, and the axes should be enriched with ticks, legends, and labels, as well as annotations for selected data points.

Adding colors, markings, and styles to plots

It is easier to understand charts when they are colored, marked, and have different styles of lines.

Using the following parameters, the following code block plots four different functions:

In order to assign colors, use the parameter color=.
To change the width/thickness of the lines, use the linewidth= parameter.
Data points are marked with different shapes using the marker= parameter.
These markers can be resized using the markersize= parameter.
Transparency can be modified using the alpha= parameter.
By using the drawstyle= parameter, the default line connectivity between data points is replaced by step connectivity.

It looks like this:

Among the four functions displayed in the output are the following, each of which has different attributes:

Figure 4. A plot showing the different color options, line styles, marker styles, transparency options, and sizes

Multi-time series charts can be generated with different colors, line styles, marker styles, transparency, and size options. It is important to choose the colors carefully, since some laptop screens and printed paper may not render them well.

Making outstanding charts requires enriching axes.

Axes can be enhanced with ticks, labels, and legends

Ticks, limits, and labels can be added to further customize the charts.

The x axis range is set using matplotlib.pyplot.xlim(…) method.

On the x axis, the ticks appear in the following order: matplotlib.pyplot.xticks(…)

By doing this, the x axis is modified so it falls within the specified limits, and the ticks are placed at the explicitly specified values:

Figure 5. On the x axis, there are ticks and explicit limits

By using matplotlib.Axes.set_yscale(…), we can also change the scale of one of the axes to non-linear.

It is possible to change the labels on the x axis with matplotlib.Axes.set_xticklabels(…):

That code block output shows the change in y axis scale, which is now logarithmic, and the tick labels on x axis:

Figure 6. A logarithmic y-axis scale and custom tick labels on the x-axis are used in this plot

If we are communicating percentage changes or multiplicative factors, logarithmic scales are useful in charts.

It is possible to set the x and y axes labels using the matplotlib.Axes.set_xlabel(…) and matplotlib.Axes.set_ylabel(…) methods.

In matplotlib.Axes.legend(…), a legend is added to make plots more readable. When loc=’best’ is specified, Matplotlib decides where the legend should appear on the plot:

The title, x- and y-axis labels, and legend are shown in the following plot:

Figure 7. Labels on the x- and y-axes, as well as a legend are shown in this plot

An explanation of the units and labels of the axes of the chart is sufficient for understanding charts with different renderings of each time series. A few special data points are always worth mentioning, however.

Annotating data points to enrich them

In our plots, we can add a text box using matplotlib.Axes.text(…):

As a result, we get:

Figure 8. Annotations in Matplotlib displayed on a plot

It is possible to control the annotations more precisely by using matplotlib.Axes.annotate(…) .

Following is the code block that controls the annotation using the following parameters:

Location of the data point is specified by the xy= parameter.
A text box’s location is specified by the xytext= parameter.
A dictionary of parameters can be specified in the arrowprops= parameter to control the arrow between the text box and the data point.
A color is specified by the facecolor= parameter, and the size of the arrow is specified by the shrink= parameter.
Orienting the text box relative to the data point is controlled by the horizontalalignment= and verticalalignment= parameters.

Following is the code:

Here are the results:

Figure 9. Data points are annotated with text and arrows on a plot

Readers can focus on the message of the chart by paying attention to the key data points.

Shape annotations can be added using matplotlib.Axes.add_patch(…) method.

In the code block that follows, we add a matplotlib.pyplot.Circle object with the following parameters:

To specify the location, use the xy= parameter
To specify the radius of the circle, use the radius= parameter
Color= specifies the circle’s color

Following is the code:

Data points are surrounded by circles in the following plot:

Figure 10. Adding a patch generates a plot with circles annotating data points

Once we have created beautiful, professional charts, we need to learn how to share them.

The saving of plots to files

Matplotlib.pyplot.figure provides a number of size and resolution options for saving plots to disk, including the dpi= parameter:

In the fig.png file, the following plot will be written:

Figure 11. Using an external viewer, creating a Matplotlib plot and saving it to a file on disk

Frequently, trading strategy performance images are exported for HTML or email reports. Charts should be printed at your printer’s DPI.

DataFrame charting with Matplotlib and Pandas

Series and DataFrame objects can be plotted with the pandas library using Matplotlib.

The Cont value should contain continuous values that mimic prices, while the Delta1 and Delta2 values should represent price changes. Five possibilities are represented by the Cat value:

DataFrame generated by this method is as follows:

This DataFrame can be visualized in a number of ways.

DataFrame column line plots

The pandas.DataFrame.plot(…) method can be used to plot ‘Cont value’ on a line plot:

Charts produced by this command are as follows:

Figure 12. The plot(…) method of pandas.DataFrame generates the following line plot

Time series are typically displayed using line charts.

A DataFrame column can be plotted as a bar graph

If the kind parameter is set to ‘bar’, pandas.DataFrame.plot(…) will generate a bar chart.

A bar chart depicting Delta1 discrete value counts can be created by grouping the DataFrame by the ‘Cat value’ value:

In the following plot, (Cat value, Delta1 discrete) values are plotted as a function of frequency:

Figure 13. An analysis of (Cat value, Delta1 discrete) value pairs is displayed in a vertical bar plot

A horizontal bar plot is built instead of a vertical one when the kind=’barh’ parameter is used:

As a result, we get:

Figure 14. Graph showing the frequency of pairs of (Delta2 discrete, Cat value)

Comparisons of categorical values are best done with bar plots.

Plotting the density and histogram of a column in a DataFrame

In the pandas.DataFrame.plot(…) method, the kind parameter is used to build a histogram.

In order to visualize the Delta1 discrete values, let’s create a histogram:

A histogram of the generated data is shown below:

Figure 15. Discrete frequency histogram of Delta1

With the kind=’kde’ parameter, we can generate a Probability Density Function (PDF) of Delta2 discrete values by using Kernel Density Estimation (KDE):

As a result, we get:

Figure 16. The PDF of Delta2 discrete values is displayed in a KDE plot

The probability distribution of some random variables can be assessed using histograms and PDFs/KDEs.

Creating scatter plots from two columns of a DataFrame

The kind=’scatter’ parameter is used to generate scatter plots from the pandas.DataFrame.plot(…).

Delta1 and Delta2 values are plotted in the following code block:

As a result, we get:

Figure 17. A scatter plot of Delta1 and Delta2 values

It builds scatter plots between non-diagonal entries and histogram/KDE plots between Delta1 and Delta2 values using the pandas.plotting.scatter_matrix(…) method: