Matplotlib, what is that ? It is a software package to make plots, yet another one... but a really good one. Since Matplotlib is a Python module, plots are described in Python, rather than a (usually clumsy) custom language. So a script using Matplotlib can harness the full power of Python and its nice modules like Numpy. Say, reading compressed data file, doing some fancy statistics, etc. Moreover, Matplotlib is rather complete, providing a wide diversity of plots and render targets such as PNG, PDF, EPS. Finally, the quality of the plots is up to the standard of scientific publication. Originally a Gnuplot user, it's been a couple of years with Matplotlib and me, and I am very happy with it. Perfect ? Well, documentation for Matplotlib is a little bit scattered. I'm working on a tutorial, a Sisyphus-grade task... Rather than waiting for completing that looooong tutorial, a short recipe today !
The problem
The problem : you want to have several plots on a single figure, you want to fit them on a A4 format, and you want to have this as a PDF document. The latter constraint, PDF document, is welcome, as the recipe is for PDF output only. It works for Matplotlib 1.1.x, and it mostly works for Matplotlib 1.0.x. It's a two step trick.
First step: multiple pages & PDF
Only one Matplotlib back-end seems to support multiple pages, the PDF back-end. We have to explicitly instantiate that back-end and tells it that we want to go multiple pages. Here's how it's done
from matplotlib.backends.backend_pdf import PdfPages pdf_pages = PdfPages('my-fancy-document.pdf')
Now, for each new page we want to create, we have to create a new Figure instance. It's simple, one page, one figure. We want a A4 format, so we can eventually print our nice document, for instance. That's not too hard, we have just to specify the proper dimensions and resolutions when creating a Figure. Once we are done with a page, we have to explicitly tell it to the back-end. In this example, the script creates 3 pages.
from matplotlib import pyplot as plot from matplotlib.backends.backend_pdf import PdfPages # The PDF document pdf_pages = PdfPages('my-fancy-document.pdf') for i in xrange(3): # Create a figure instance (ie. a new page) fig = plot.figure(figsize=(8.27, 11.69), dpi=100) # Plot whatever you wish to plot # Done with the page pdf_pages.savefig(fig) # Write the PDF document to the disk pdf_pages.close()
Second step: multiple plots on a figure
One A4 page is a lot of space, we can squeeze a lot of plots in one single figure. Recent Matplotlib versions (1.x.x and later) have now a nice feature to easily layout plots on one figure : grid layout.
plot.subplot2grid((5, 2), (2, 0), rowspan=1, colspan=1)
The first parameter of subplot2grid is the size of the grid. The second parameter is the coordinate of one element of the grid. That element will be filled with a plot. Finally, we can specify the span of the plot, so that its spans over several elements of the grid. This gives a great deal of control over the layout, without too much headaches. There is a small gotcha however: the coordinates are in row, col order. Here's an example of subplot2grid in action.
import numpy from matplotlib import pyplot as plot # Prepare the data t = numpy.linspace(-numpy.pi, numpy.pi, 1024) s = numpy.random.randn(2, 256) # # Do the plot # grid_size = (5, 2) # Plot 1 plot.subplot2grid(grid_size, (0, 0), rowspan=2, colspan=2) plot.plot(t, numpy.sinc(t), c= '#000000') # Plot 2 plot.subplot2grid(grid_size, (2, 0), rowspan=3, colspan=1) plot.scatter(s[0], s[1], c= '#000000') # Plot 2 plot.subplot2grid(grid_size, (2, 1), rowspan=3, colspan=1) plot.plot(numpy.sin(2 * t), numpy.cos(0.5 * t), c= '#000000') # Automagically fits things together plot.tight_layout() # Done ! plot.show()
Note the tight_layout call, which pack all the plots within a figure automatically, and does a nice job at it. But it is in Matplotlib 1.1.x only... With Matplotlib 1.0.x I don't know yet how to do it. Why bothering ? Because say, at work, you might have to deal with Linux distributions are providing rather outdated versions of software in their official packages repositories, and even the unofficial ones are lagging. Yes, CentOS, I'm looking at you !
Putting it together
Now, we can combine subplot2grid with the multiple page trick, and get multiple pages PDF documents which look good.
import numpy from matplotlib import pyplot as plot from matplotlib.backends.backend_pdf import PdfPages # Generate the data data = numpy.random.randn(7, 1024) # The PDF document pdf_pages = PdfPages('histograms.pdf') # Generate the pages nb_plots = data.shape[0] nb_plots_per_page = 5 nb_pages = int(numpy.ceil(nb_plots / float(nb_plots_per_page))) grid_size = (nb_plots_per_page, 1) for i, samples in enumerate(data): # Create a figure instance (ie. a new page) if needed if i % nb_plots_per_page == 0: fig = plot.figure(figsize=(8.27, 11.69), dpi=100) # Plot stuffs ! plot.subplot2grid(grid_size, (i % nb_plots_per_page, 0)) plot.hist(samples, 32, normed=1, facecolor='#808080', alpha=0.75) # Close the page if needed if (i + 1) % nb_plots_per_page == 0 or (i + 1) == nb_plots: plot.tight_layout() pdf_pages.savefig(fig) # Write the PDF document to the disk pdf_pages.close()

A preview of the PDF document you should get from the example. Do not mind the ugly antialiasing, it's me who fumbled with the PDF to PNG conversion.
As you can see, a little bit of arithmetic is used to have the proper number of pages and create them at the right moment. It's a minimal example, but by now, you got the recipe.