How to Merge PDF Files in Python

Learn how to merge two or multiple PDF files into a single PDF file using PyPDF4 library in Python
  · · 5 min read · Updated jun 2023 · PDF File Handling

Confused by complex code? Let our AI-powered Code Explainer demystify it for you. Try it out!

The primary goal of merging PDF files is for proper file management, for archiving, bulk printing, or combining datasheets, e-books, and reports. You definitely need an efficient tool to merge small PDF files into a single PDF.

This tutorial is intended to show you how to merge a list of PDF files into a single PDF using the Python programming language. The combined PDF may include bookmarks to improve the navigation where every bookmark is linked to the content of one of the inputted PDF files.

We'll be using the PyPDF4 library for this purpose. PyPDF4 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

Download: Practical Python PDF Processing EBook.

Let's install it:

$ pip install PyPDF4==1.27.0

Importing the libraries:

#Import Libraries
from PyPDF4 import PdfFileMerger
import os,argparse

Let's define our core function:

def merge_pdfs(input_files: list, page_range: tuple, output_file: str, bookmark: bool = True):
    """
    Merge a list of PDF files and save the combined result into the `output_file`.
    `page_range` to select a range of pages (behaving like Python's range() function) from the input files
        e.g (0,2) -> First 2 pages 
        e.g (0,6,2) -> pages 1,3,5
    bookmark -> add bookmarks to the output file to navigate directly to the input file section within the output file.
    """
    # strict = False -> To ignore PdfReadError - Illegal Character error
    merger = PdfFileMerger(strict=False)
    for input_file in input_files:
        bookmark_name = os.path.splitext(os.path.basename(input_file))[0] if bookmark else None
        # pages To control which pages are appended from a particular file.
        merger.append(fileobj=open(input_file, 'rb'), pages=page_range, import_bookmarks=False, bookmark=bookmark_name)
    # Insert the pdf at specific page
    merger.write(fileobj=open(output_file, 'wb'))
    merger.close()

So we first create a PDFFileMerger object and then iterates over input_files from the input. After that, for each input PDF file, we define a bookmark if required depending on the bookmark variable and add it to the merger object taking into account the page_range chosen.

Next, we use the append() method from the merger to add our PDF file.

Finally, we write the output PDF file and close the object.

Let's now add a function to parse command-line arguments:

def parse_args():
    """Get user command line parameters"""
    parser = argparse.ArgumentParser(description="Available Options")
    parser.add_argument('-i', '--input_files', dest='input_files', nargs='*',
                        type=str, required=True, help="Enter the path of the files to process")
    parser.add_argument('-p', '--page_range', dest='page_range', nargs='*',
                        help="Enter the pages to consider e.g.: (0,2) -> First 2 pages")
    parser.add_argument('-o', '--output_file', dest='output_file',
                        required=True, type=str, help="Enter a valid output file")
    parser.add_argument('-b', '--bookmark', dest='bookmark', default=True, type=lambda x: (
        str(x).lower() in ['true', '1', 'yes']), help="Bookmark resulting file")
    # To Porse The Command Line Arguments
    args = vars(parser.parse_args())
    # To Display The Command Line Arguments
    print("## Command Arguments #################################################")
    print("\n".join("{}:{}".format(i, j) for i, j in args.items()))
    print("######################################################################")
    return args

Get Our Practical Python PDF Processing EBook

Master PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!

Download EBook

Now let's use the previously defined functions in our main code:

if __name__ == "__main__":
    # Parsing command line arguments entered by user
    args = parse_args()
    page_range = None
    if args['page_range']:
        page_range = tuple(int(x) for x in args['page_range'][0].split(','))
    # call the main function
    merge_pdfs(
        input_files=args['input_files'], page_range=page_range, 
        output_file=args['output_file'], bookmark=args['bookmark']
    )

Alright, we're done with coding, let's test it out:

$ python pdf_merger.py --help

Output:

usage: pdf_merger.py [-h] -i [INPUT_FILES [INPUT_FILES ...]] [-p [PAGE_RANGE [PAGE_RANGE ...]]] -o OUTPUT_FILE [-b BOOKMARK]

Available Options

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT_FILES [INPUT_FILES ...]], --input_files [INPUT_FILES [INPUT_FILES ...]]
                        Enter the path of the files to process
  -p [PAGE_RANGE [PAGE_RANGE ...]], --page_range [PAGE_RANGE [PAGE_RANGE ...]]
                        Enter the pages to consider e.g.: (0,2) -> First 2 pages
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Enter a valid output file
  -b BOOKMARK, --bookmark BOOKMARK
                        Bookmark resulting file

Here is an example of merging two PDF files into one:

$ python pdf_merger.py -i bert-paper.pdf letter.pdf -o combined.pdf

You need to separate the input PDF files with a comma (,) in the -i argument, and you must not add any space.

A new combined.pdf appeared in the current directory that contains both of the input PDF files, the output is:

## Command Arguments #################################################
input_files:['bert-paper.pdf', 'letter.pdf']
page_range:None
output_file:combined.pdf
bookmark:True
######################################################################

Make sure you use the right order of the input files when passing the -i argument.

Conclusion

I hope this code helped you out in merging PDF files easily and without 3rd party or online tools, as using Python to perform such tasks is more convenient.

If you want to split PDF documents instead, this tutorial will certainly help you.

Check the full code here.

Here are some related Python tutorials:

Finally, for more PDF handling guides on Python, you can check our Practical Python PDF Processing EBook, where we dive deeper into PDF document manipulation with Python, make sure to check it out here if you're interested!

Happy coding ♥

Finished reading? Keep the learning going with our AI-powered Code Explainer. Try it now!

View Full Code Understand My Code
Sharing is caring!



Read Also



Comment panel

    Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!