Blog Post

Complete Guide to PDF.js

Illustration: Complete Guide to PDF.js

In this post, we’ll provide you with a complete overview of PDF.js, including a quick technology explainer and key features. You’ll also get a step-by-step integration guide that covers opening a PDF with PDF.js, manipulating pages, handling annotations, customizing your viewer, and more.

What Is PDF.js?

PDF.js is an open source JavaScript library that allows you to render PDF files in a web browser without the need for any plugins or external software. It was released by Mozilla on 2 July 2011 and is now maintained by a team of developers from across the world. PDF.js is built on top of the Canvas API and other web technologies, making it easy to integrate into web applications.

How Does PDF.js Work?

PDF.js is designed to be modular, with each module focusing on a specific task. This modular architecture allows you to include only the modules you need, reducing the size of your code and improving performance. PDF.js has several layers, each with its own purpose:

  1. Core layer — This is the lowest-level layer of PDF.js, responsible for parsing the binary format of a PDF file and converting it into an internal representation that can be used by higher-level layers. The core layer is typically used directly only by advanced users who need fine-grained control of the parsing process.

  2. Display layer — The display layer builds upon the core layer and provides a more user-friendly API for rendering PDF files. With the display layer, you can easily render a PDF page into a <canvas> element using just a few lines of JavaScript code. This layer is suitable for most day-to-day use cases.

  3. Viewer layer — The viewer layer is a ready-to-use user interface (UI) that comes with PDF.js. It includes features like search, rotation, a thumbnail sidebar, and more. The viewer layer is built on top of the display layer and provides a complete PDF viewing experience out of the box.

PDF.js Key Features

PDF.js provides a set of features for viewing, annotating, and manipulating PDF documents:

  • Render PDF documents in the browser using the HTML5 <canvas> element

  • Search for text within a document

  • View page thumbnails

  • Zoom in and out of pages

  • Rotate pages

  • Add text and highlight annotations to a document

  • Fill out PDF form fields

  • View and navigate through bookmarks and document outlines

Some PDF.js Limitations

PDF.js is a powerful library for rendering PDF documents in the browser, but like any software, it has its limitations and drawbacks:

  • PDF.js rendering can be slow and resource-intensive, particularly for large or complex PDF documents.

  • Text selection and searching can be slow or inaccurate, particularly for documents with complex formatting or embedded images.

  • The accuracy of PDF.js rendering can vary depending on the browser and platform being used.

  • Certain PDF features, such as interactive forms and multimedia elements, may not be fully supported or may not work as expected in PDF.js.

  • PDF.js doesn’t support all of the features available in the latest PDF specification, so some documents may not render correctly or may not be compatible with PDF.js at all.

  • PDF.js may not be the best choice for applications that require advanced PDF functionality or performance, such as document editing or printing. In these cases, a more specialized PDF library may be a better option.

Getting Started with PDF.js

To get started with PDF.js, download the library as a ZIP file or clone the repository using Git.

Extract the ZIP file and copy the pdf.js and pdf.worker.js files from the build/ folder to your project directory.

In your HTML file, add the following script tag to load the PDF.js library:

<script src="./pdf.js"></script>

After including the script tag, you can start using PDF.js to render PDF files on your webpage.

Rendering a PDF File with PDF.js

  1. To render a PDF file with PDF.js, create a canvas element in your HTML file. You’ll use this canvas element to display the PDF file. Here’s an example of how to create a canvas element:

<canvas id="pdf-canvas"></canvas>
  1. Next, write some JavaScript code to load and render the PDF file. Create a file named index.js and add the following code to it:

// Get the canvas element.
const canvas = document.getElementById('pdf-canvas');

// Get the PDF file URL.
const pdfUrl = 'pspdfkit-web-demo.pdf';

pdfjsLib.GlobalWorkerOptions.workerSrc = './pdf.worker.js';

// Load the PDF file using PDF.js.
pdfjsLib.getDocument(pdfUrl).promise.then(function (pdfDoc) {
	// Get the first page of the PDF file.
	pdfDoc
		.getPage(1)
		.then(function (page) {
			const viewport = page.getViewport({ scale: 1 });

			// Set the canvas dimensions to match the PDF page size.
			canvas.width = viewport.width;
			canvas.height = viewport.height;

			// Set the canvas rendering context.
			const ctx = canvas.getContext('2d');

			const renderContext = {
				canvasContext: ctx,
				viewport: viewport,
			};

			// Render the PDF page to the canvas.
			page.render(renderContext);
		})
		.then(function () {
			console.log('Rendering complete');
		});
});

Let’s break down this code:

  • Get the canvas element using its ID.

  • Specify the URL of the PDF file you want to render.

  • Use the getDocument() method to load the PDF file into memory. This method returns a Promise that resolves to a PDFDocumentProxy object, which represents the PDF document.

  • Use the getPage() method to get the first page of the PDF file.

  • Set the width and height of the canvas element to match the size of the page.

  • Get the rendering context of the canvas element and create an object that represents the rendering context.

  • Call the render() method on the PDFPageProxy object to render the page on the canvas element.

  1. When working with PDF.js, it’s important to handle errors appropriately. For example, if the PDF file doesn’t exist or is corrupted, you should handle that error gracefully:

PDFJS.getDocument(pdfUrl)
	.promise.then(function (pdf) {
		// Do something with the PDF document.
	})
	.catch(function (error) {
		console.log('Error loading PDF file:', error);
	});

In this example, you use the catch() method to catch any errors that occur when loading the PDF file. You log the error to the console, but you could handle it in other ways, such as displaying an error message to the user.

  1. Include the index.js file in your HTML file:

<script src="./index.js"></script>

Make sure to add your PDF file to the same directory as the HTML file. You can use the demo PDF file as an example.

This is a simple example of how to render a PDF file using PDF.js. PDF.js provides many more options and features that you’ll explore in the next sections.

Running the Project

To run the project, follow the steps in this section.

  1. Install the serve package:

npm install --global serve
  1. Serve the contents of the current directory:

serve -l 8080 .
  1. Navigate to http://localhost:8080 to view the project.

Controlling PDF Rendering

PDF.js provides many options for controlling how PDF files are rendered. These options can be passed as parameters to the page.render() method.

Here are some of the most common options:

  • canvasContext — Specifies the rendering context to use for rendering the PDF page. This is typically a 2D canvas context.

  • viewport — Specifies the viewport to use for rendering the PDF page. The viewport defines the part of the PDF page that should be displayed on the canvas. It can be customized with options such as scale, rotation, and offset.

  • background — Specifies the color or pattern to use for the background of the canvas. This can be set to a CSS color value or a canvas pattern object.

Here’s an example of how to use some of these options:

page.render({
	canvasContext: ctx,
	viewport: page.getViewport({ scale: 1.5 }),
	background: 'rgb(255,0, 0)',
});

This will render the PDF page with a 1.5x zoom and with a red background.

PDF.js provides several methods for navigating PDF documents, including scrolling, zooming, and searching. Here’s an overview of some of the most commonly used navigation methods:

  • Scrolling — PDF.js allows you to scroll through a document using the mouse or touchpad. You can also use the scrollbar to navigate through the document.

  • Zooming — You can zoom in and out of a PDF document using the mouse or touchpad. You can also use the zoom buttons on the toolbar to zoom in and out.

  • Searching — PDF.js provides a search bar that allows you to search for specific words or phrases in a PDF document. You can also use the “find” command to search for text within the document.

Manipulating a PDF Document with PDF.js

PDF.js provides several methods for manipulating PDF documents, with one of the most common being filling out forms. With the form-filling feature in PDF.js, users can complete and submit PDF forms and save them as new PDF documents. However, it’s important to note that PDF.js is primarily designed as a PDF viewer, and its manipulation capabilities may be limited compared to other PDF editing software. Depending on the specific implementation of PDF.js being used, additional tools or libraries may be required for more advanced PDF document manipulation.

Handling PDF Annotations

PDF.js is primarily designed as a viewer for rendering PDF files, and it doesn’t provide a built-in mechanism for editing or modifying PDF documents. However, it does provide a way to access PDF annotations such as links, highlights, and comments. PDF annotations can be accessed using the getAnnotations() method on a PDF page.

Here’s an example of how to render PDF annotations:

// Load annotation data.
page
	.getAnnotations()
	.then(function (annotations) {
		annotations.forEach(function (annotation) {
			if (annotation.subtype === 'Text') {
				// Render a text annotation.
				const textRect = annotation.rect;
				const text = document.createElement('div');
				text.style.position = 'absolute';
				text.style.left = textRect[0] + 'px';
				text.style.top = textRect[1] + 'px';
				text.style.width = textRect[2] - textRect[0] + 'px';
				text.style.height = textRect[3] - textRect[1] + 'px';
				text.style.backgroundColor = 'green';
				text.style.opacity = '0.5';
				text.innerText = annotation.contents;
				canvas.parentNode.appendChild(text);
			} else if (annotation.subtype === 'Highlight') {
				// Render a highlight annotation.
				const highlightRect = annotation.rect;
				const highlight = document.createElement('div');
				highlight.style.position = 'absolute';
				highlight.style.left = highlightRect[0] + 'px';
				highlight.style.top = highlightRect[1] + 'px';
				highlight.style.width =
					highlightRect[2] - highlightRect[0] + 'px';
				highlight.style.height =
					highlightRect[3] - highlightRect[1] + 'px';
				highlight.style.backgroundColor = 'yellow';
				highlight.style.opacity = '0.5';
				canvas.parentNode.appendChild(highlight);
			}
		});
	})
	.catch(function (error) {
		console.log('Error loading annotations:', error);
	});

Make sure the PDF file you’re using actually contains annotations.

This code snippet retrieves all the annotations for the current page using the page.getAnnotations() method. It then loops through each annotation and checks its subtype to determine what type of annotation it is.

For text annotations, it creates a div element, sets its position and dimensions using the annotation’s rectangle coordinates, and adds it to the container element with a green background color and opacity of 0.5. Similarly, for highlight annotations, it creates a div element, sets its position and dimensions using the annotation’s rectangle coordinates, and adds it to the container element with a yellow background color and opacity of 0.5.

This code will render text annotations as green semi-transparent rectangles with the text contents of the annotation on top of the PDF page at the position specified by the annotation’s rectangle coordinates, and it’ll highlight annotations as yellow rectangles with 50 percent opacity.

Handling PDF Text Selection

PDF.js provides support for selecting and copying text from PDF files. PDF text can be accessed using the getTextContent() method on a PDF page.

Here’s an example of how to extract text from a PDF page:

page.getTextContent().then(function (textContent) {
	let text = '';
	for (let i = 0; i < textContent.items.length; i++) {
		const item = textContent.items[i];
		text += item.str;
	}
	console.log(text);
});

This code will extract all the text from the PDF page and log it to the console.

Information

Interact with the sandbox by clicking the left rectangle icon and selecting Editor > Show Default Layout. To edit, sign in with GitHub — click the rectangle icon again and choose Sign in. To preview the result, click the rectangle icon once more and choose Editor > Embed Preview. For the full example, click the Open Editor button. Enjoy experimenting with the project!

Information

Check out the source code for this example on GitHub.

Customizing the PDF Viewer

PDF.js includes several built-in features that allow you to customize the viewer’s appearance and behavior, such as setting the default zoom level, hiding certain elements, and changing the viewer’s language. You can also use CSS to further customize the viewer’s appearance.

Alternative to PDF.js: PSPDFKit

PSPDFKit is a commercial PDF rendering and processing library that offers several advantages over PDF.js:

  • Performance — PSPDFKit is optimized for performance and can handle large PDF files more efficiently than PDF.js.
  • Functionality — PSPDFKit offers advanced features that aren’t available in PDF.js, such as digital signatures, annotations, and form filling. It also has a more comprehensive API, making it easier to integrate into existing workflows.
  • Support — PSPDFKit is a commercial product that comes with dedicated technical support, ensuring that any issues or problems can be quickly resolved.
  • Compatibility — PSPDFKit works on all major platforms, including web, desktop, and mobile. It also supports a wide range of file formats, including PDF, Office, and image files.
  • Customization — PSPDFKit offers a high degree of customization, allowing developers to tailor the user interface and functionality to meet their specific needs.

Overall, PSPDFKit is a better solution for businesses and organizations that require advanced PDF functionality and performance, as well as dedicated technical support. While PDF.js is a free and open source library that’s suitable for basic PDF rendering, it may not be suitable for more complex use cases.

Check out our PSPDFKit for Web product page and demo to learn more about the features and benefits of PSPDFKit.

Conclusion

PDF.js is a powerful and flexible library that allows developers to render PDF documents directly in the browser without the need for any plugins or third-party software. By following this complete guide, you learned everything you need to know to get started with PDF.js and build your own PDF viewer or PDF-related application.

Related Products
Share Post
Free 60-Day Trial Try PSPDFKit in your app today.
Free Trial

Related Articles

Explore more
PRODUCTS  |  Web • Releases • Components

PSPDFKit for Web 2024.3 Features New Stamps and Signing UI, Export to Office Formats, and More

PRODUCTS  |  Web • Releases • Components

PSPDFKit for Web 2024.2 Features New Unified UI Icons, Shadow DOM, and Tab Ordering

PRODUCTS  |  Web

Now Available for Public Preview: New Document Authoring Experience Provides Glimpse into the Future of Editing