Extracting PDF pages with Python

I wanted to have a quick solution to provide sample PDF pages from my book Command Line: A Modern Introduction to potential customers. It turns out that with Python this is very easy to do in an automated and repeatable way.

The task at hand is then:

Given a PDF with multiple pages
Extract a given page or a collection of pages as a separate PDF

PyPDF2

Personally, I have used a Python PDF library called PyPDF2. With PyPDF2, we just need to:

Install PyPDF2 via pip install pypdf2 or use a dependency manager of our choice
Read the original PDF file with open() Python function
Use PdfFileReader object to read a page or multiple pages to extract
Use PdfFileWriter object to add those pages to a new virtual PDF file
Save the new pages as a new file

Example

To see this in action, look at my own example of extracting multiple pages using a range to produce a new PDF file:

from PyPDF2 import PdfFileReader, PdfFileWriter

writer = PdfFileWriter()
with open("full.pdf", 'rb') as infile:
    for page in range(11,17):
        reader = PdfFileReader(infile)
        writer.addPage(reader.getPage(page))
        with open(f'example.pdf', 'wb') as outfile:
            writer.write(outfile)

It shouldn't be difficult to alter the example to your own needs as necessary.

Last updated on 18.12.2022.