Multiple pages PDF documents with Matplotlib

Matplotlib, what is that ? It is a software package to make plots, yet another one... but a really good one. Since Matplotlib is a Python module, plots are described in Python, rather than a (usually clumsy) custom language. So a script using Matplotlib can harness the full power of Python and its nice modules like Numpy. Say, reading compressed data file, doing some fancy statistics, etc. Moreover, Matplotlib is rather complete, providing a wide diversity of plots and render targets such as PNG, PDF, EPS. Finally, the quality of the plots is up to the standard of scientific publication. Originally a Gnuplot user, it's been a couple of years with Matplotlib and me, and I am very happy with it. Perfect ? Well, documentation for Matplotlib is a little bit scattered. I'm working on a tutorial, a Sisyphus-grade task... Rather than waiting for completing that looooong tutorial, a short recipe today !

The problem

The problem : you want to have several plots on a single figure, you want to fit them on a A4 format, and you want to have this as a PDF document. The latter constraint, PDF document, is welcome, as the recipe is for PDF output only. It works for Matplotlib 1.1.x, and it mostly works for Matplotlib 1.0.x. It's a two step trick.

First step: multiple pages & PDF

Only one Matplotlib back-end seems to support multiple pages, the PDF back-end. We have to explicitly instantiate that back-end and tells it that we want to go multiple pages. Here's how it's done

from matplotlib.backends.backend_pdf import PdfPages

pdf_pages = PdfPages('my-fancy-document.pdf')

Now, for each new page we want to create, we have to create a new Figure instance. It's simple, one page, one figure. We want a A4 format, so we can eventually print our nice document, for instance. That's not too hard, we have just to specify the proper dimensions and resolutions when creating a Figure. Once we are done with a page, we have to explicitly tell it to the back-end. In this example, the script creates 3 pages.

from matplotlib import pyplot as plot
from matplotlib.backends.backend_pdf import PdfPages

# The PDF document
pdf_pages = PdfPages('my-fancy-document.pdf')

for i in xrange(3):
  # Create a figure instance (ie. a new page)
  fig = plot.figure(figsize=(8.27, 11.69), dpi=100)

  # Plot whatever you wish to plot

  # Done with the page
  pdf_pages.savefig(fig)

# Write the PDF document to the disk
pdf_pages.close()

Second step: multiple plots on a figure

One A4 page is a lot of space, we can squeeze a lot of plots in one single figure. Recent Matplotlib versions (1.x.x and later) have now a nice feature to easily layout plots on one figure : grid layout.

plot.subplot2grid((5, 2), (2, 0), rowspan=1, colspan=1)

The first parameter of subplot2grid is the size of the grid. The second parameter is the coordinate of one element of the grid. That element will be filled with a plot. Finally, we can specify the span of the plot, so that its spans over several elements of the grid. This gives a great deal of control over the layout, without too much headaches. There is a small gotcha however: the coordinates are in row, col order. Here's an example of subplot2grid in action.

import numpy
from matplotlib import pyplot as plot

# Prepare the data
t = numpy.linspace(-numpy.pi, numpy.pi, 1024)
s = numpy.random.randn(2, 256)

#
# Do the plot
#
grid_size = (5, 2)

# Plot 1
plot.subplot2grid(grid_size, (0, 0), rowspan=2, colspan=2)
plot.plot(t, numpy.sinc(t), c= '#000000')

# Plot 2
plot.subplot2grid(grid_size, (2, 0), rowspan=3, colspan=1)
plot.scatter(s[0], s[1], c= '#000000')

# Plot 2
plot.subplot2grid(grid_size, (2, 1), rowspan=3, colspan=1)
plot.plot(numpy.sin(2 * t), numpy.cos(0.5 * t), c= '#000000')

# Automagically fits things together
plot.tight_layout()

# Done !
plot.show()

Grid layout in action

Note the tight_layout call, which pack all the plots within a figure automatically, and does a nice job at it. But it is in Matplotlib 1.1.x only... With Matplotlib 1.0.x I don't know yet how to do it. Why bothering ? Because say, at work, you might have to deal with Linux distributions are providing rather outdated versions of software in their official packages repositories, and even the unofficial ones are lagging. Yes, CentOS, I'm looking at you !

Putting it together

Now, we can combine subplot2grid with the multiple page trick, and get multiple pages PDF documents which look good.

import numpy

from matplotlib import pyplot as plot
from matplotlib.backends.backend_pdf import PdfPages

# Generate the data
data = numpy.random.randn(7, 1024)

# The PDF document
pdf_pages = PdfPages('histograms.pdf')

# Generate the pages
nb_plots = data.shape[0]
nb_plots_per_page = 5
nb_pages = int(numpy.ceil(nb_plots / float(nb_plots_per_page)))
grid_size = (nb_plots_per_page, 1)

for i, samples in enumerate(data):
  # Create a figure instance (ie. a new page) if needed
  if i % nb_plots_per_page == 0:
  fig = plot.figure(figsize=(8.27, 11.69), dpi=100)

  # Plot stuffs !
  plot.subplot2grid(grid_size, (i % nb_plots_per_page, 0))
  plot.hist(samples, 32, normed=1, facecolor='#808080', alpha=0.75)

  # Close the page if needed
  if (i + 1) % nb_plots_per_page == 0 or (i + 1) == nb_plots:
    plot.tight_layout()
    pdf_pages.savefig(fig)

# Write the PDF document to the disk
pdf_pages.close()

A preview of the PDF document you should get from the example. Do not mind the ugly antialiasing, it's me who fumbled with the PDF to PNG conversion.

As you can see, a little bit of arithmetic is used to have the proper number of pages and create them at the right moment. It's a minimal example, but by now, you got the recipe.