How to Merge PDF Files in Python

Learn how to merge two or multiple PDF files into a single PDF file using PyPDF4 library in Python
  · · 5 min read · Updated aug 2021 · PDF File Handling


The primary goal of merging PDF files is for proper file management, for archiving, bulk printing, or combining datasheets, e-books, and reports. You definitely need an efficient tool to merge small PDF files into a single PDF.

This tutorial is intended to show you how to merge a list of PDF files into a single PDF using the Python programming language. The combined PDF may include bookmarks to improve the navigation where every bookmark is linked to the content of one of the inputted PDF files.

We'll be using the PyPDF4 library for this purpose. PyPDF4 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

Let's install it:

$ pip install PyPDF4==1.27.0

Importing the libraries:

#Import Libraries
from PyPDF4 import PdfFileMerger
import os,argparse

Let's define our core function:

def merge_pdfs(input_files: list, page_range: tuple, output_file: str, bookmark: bool = True):
    """
    Merge a list of PDF files and save the combined result into the `output_file`.
    `page_range` to select a range of pages (behaving like Python's range() function) from the input files
        e.g (0,2) -> First 2 pages 
        e.g (0,6,2) -> pages 1,3,5
    bookmark -> add bookmarks to the output file to navigate directly to the input file section within the output file.
    """
    # strict = False -> To ignore PdfReadError - Illegal Character error
    merger = PdfFileMerger(strict=False)
    for input_file in input_files:
        bookmark_name = os.path.splitext(os.path.basename(input_file))[0] if bookmark else None
        # pages To control which pages are appended from a particular file.
        merger.append(fileobj=open(input_file, 'rb'), pages=page_range, bookmark=bookmark_name)
    # Insert the pdf at specific page
    merger.write(fileobj=open(output_file, 'wb'))
    merger.close()

So we first create a PDFFileMerger object and then iterates over input_files from the input. After that, for each input PDF file, we define a bookmark if required depending on the bookmark variable and add it to the merger object taking into account the page_range chosen.

Next, we use the append() method from the merger to add our PDF file.

Finally, we write the output PDF file and close the object.

Let's now add a function to parse command-line arguments:

def parse_args():
    """Get user command line parameters"""
    parser = argparse.ArgumentParser(description="Available Options")
    parser.add_argument('-i', '--input_files', dest='input_files', nargs='*',
                        type=str, required=True, help="Enter the path of the files to process")
    parser.add_argument('-p', '--page_range', dest='page_range', nargs='*',
                        help="Enter the pages to consider e.g.: (0,2) -> First 2 pages")
    parser.add_argument('-o', '--output_file', dest='output_file',
                        required=True, type=str, help="Enter a valid output file")
    parser.add_argument('-b', '--bookmark', dest='bookmark', default=True, type=lambda x: (
        str(x).lower() in ['true', '1', 'yes']), help="Bookmark resulting file")
    # To Porse The Command Line Arguments
    args = vars(parser.parse_args())
    # To Display The Command Line Arguments
    print("## Command Arguments #################################################")
    print("\n".join("{}:{}".format(i, j) for i, j in args.items()))
    print("######################################################################")
    return args

Now let's use the previously defined functions in our main code:

if __name__ == "__main__":
    # Parsing command line arguments entered by user
    args = parse_args()
    # convert a single str to a list
    input_files = [str(x) for x in args['input_files'][0].split(',')]
    page_range = None
    if args['page_range']:
        page_range = tuple(int(x) for x in args['page_range'][0].split(','))
    # call the main function
    merge_pdfs(
        input_files=input_files, page_range=page_range, 
        output_file=args['output_file'], bookmark=args['bookmark']
    )

Alright, we're done with coding, let's test it out:

$ python pdf_merger.py --help

Output:

usage: pdf_merger.py [-h] -i [INPUT_FILES [INPUT_FILES ...]] [-p [PAGE_RANGE [PAGE_RANGE ...]]] -o OUTPUT_FILE [-b BOOKMARK]

Available Options

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT_FILES [INPUT_FILES ...]], --input_files [INPUT_FILES [INPUT_FILES ...]]
                        Enter the path of the files to process
  -p [PAGE_RANGE [PAGE_RANGE ...]], --page_range [PAGE_RANGE [PAGE_RANGE ...]]
                        Enter the pages to consider e.g.: (0,2) -> First 2 pages
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Enter a valid output file
  -b BOOKMARK, --bookmark BOOKMARK
                        Bookmark resulting file

Here is an example merging two PDF files into one:

$ python pdf_merger.py -i bert-paper.pdf,letter.pdf -o combined.pdf

You need to separate the input PDF files with a comma (,) in the -i argument, and you must not add any space.

A new combined.pdf appeared in the current directory that contains both of the input PDF files, the output is:

## Command Arguments #################################################
input_files:['bert-paper.pdf,letter.pdf']
page_range:None
output_file:combined.pdf
bookmark:True
######################################################################

Make sure you use the right order of the input files when passing the -i argument.

Conclusion

I hope this code helped you out on merging PDF files easily and without 3rd party or online tools, using Python to perform such tasks is more convenient.

Check the full code here.

Here are some related Python tutorials:

Happy coding ♥

View Full Code
Sharing is caring!



Read Also




Comment panel