Extracting PDF pages with Python
I wanted to have a quick solution to provide sample PDF pages from my book Command Line: A Modern Introduction to potential customers. It turns out that with Python this is very easy to do in an automated and repeatable way.
The task at hand is then:
- Given a PDF with multiple pages
- Extract a given page or a collection of pages as a separate PDF
PyPDF2
Personally, I have used a Python PDF library called PyPDF2. With PyPDF2, we just need to:
- Install PyPDF2 via
pip install pypdf2
or use a dependency manager of our choice - Read the original PDF file with
open()
Python function - Use
PdfFileReader
object to read a page or multiple pages to extract - Use
PdfFileWriter
object to add those pages to a new virtual PDF file - Save the new pages as a new file
Example
To see this in action, look at my own example of extracting multiple pages using a range to produce a new PDF file:
from PyPDF2 import PdfFileReader, PdfFileWriter
writer = PdfFileWriter()
with open("full.pdf", 'rb') as infile:
for page in range(11,17):
reader = PdfFileReader(infile)
writer.addPage(reader.getPage(page))
with open(f'example.pdf', 'wb') as outfile:
writer.write(outfile)
It shouldn't be difficult to alter the example to your own needs as necessary.
Last updated on 18.12.2022.