How to Encrypt and Decrypt PDF Files in Python

Learn how to add and remove passwords to PDF files using PyPDF4 library, as well as using pyAesCrypt to encrypt and decrypt PDF files in Python
  · · 9 min read · Updated aug 2021 · PDF File Handling


There are many purposes where you want to encrypt your PDF file, one of which is stopping someone from copying your PDF to their computer and making it usable only with a decryption key. With an encrypted PDF file, you can prevent unwanted parties from viewing personal or credential information within a PDF file.

In this tutorial, you will learn how to encrypt PDF files by applying two protection levels:

  • Level 1: Limiting access to the PDF file by adding a Document Open Password. A Document Open password (also known as a user password) requires a user to type a password in order to open the PDF.
  • Level 2: Encrypting the file using the pyAesCrypt library and by using the AES256-CBC encryption algorithm.

The purpose of this tutorial is to develop a lightweight command-line-based utility, through Python-based modules without relying on external utilities outside the Python ecosystem (e.g. qpdf) in order to secure PDF files in Python.

Before getting started, let's install the required libraries:

$ pip install PyPDF4==1.27.0 pyAesCrypt==6.0.0

Let's import the necessary libraries in our Python file:

# Import Libraries
from PyPDF4 import PdfFileReader, PdfFileWriter, utils
import os
import argparse
import getpass
from io import BytesIO
import pyAesCrypt

First, let's define a function that checks whether the PDF file is encrypted:

# Size of chunck
BUFFER_SIZE = 64*1024

def is_encrypted(input_file: str) -> bool:
    """Checks if the inputted file is encrypted using PyPDF4 library"""
    with open(input_file, 'rb') as pdf_file:
        pdf_reader = PdfFileReader(pdf_file, strict=False)
        return pdf_reader.isEncrypted

Second, let's make the core function, which is encrypting the PDF file:

def encrypt_pdf(input_file: str, password: str):
    """
    Encrypts a file using PyPDF4 library.
    Precondition: File is not encrypted.
    """
    pdf_writer = PdfFileWriter()
    pdf_reader = PdfFileReader(open(input_file, 'rb'), strict=False)
    if pdf_reader.isEncrypted:
        print(f"PDF File {input_file} already encrypted")
        return False, None, None
    try:
        # To encrypt all the pages of the input file, you need to loop over all of them
        # and to add them to the writer.
        for page_number in range(pdf_reader.numPages):
            pdf_writer.addPage(pdf_reader.getPage(page_number))
    except utils.PdfReadError as e:
        print(f"Error reading PDF File {input_file} = {e}")
        return False, None, None
    # The default is 128 bit encryption (if false then 40 bit encryption).
    pdf_writer.encrypt(user_pwd=password, owner_pwd=None, use_128bit=True)
    return True, pdf_reader, pdf_writer

The encrypt_pdf() function performs the following:

  • It validates that the input PDF file is not encrypted using the PyPDF4 library.
  • It iterates throughout its pages and adds them to a pdf_writer object.
  • Encrypts the pdf_writer object using a given password.

Now that we have the function that is responsible for encryption, let's make the opposite, that's decryption:

def decrypt_pdf(input_file: str, password: str):
    """
    Decrypts a file using PyPDF4 library.
    Precondition: A file is already encrypted
    """
    pdf_reader = PdfFileReader(open(input_file, 'rb'), strict=False)
    if not pdf_reader.isEncrypted:
        print(f"PDF File {input_file} not encrypted")
        return False, None, None
    pdf_reader.decrypt(password=password)
    pdf_writer = PdfFileWriter()
    try:
        for page_number in range(pdf_reader.numPages):
            pdf_writer.addPage(pdf_reader.getPage(page_number))
    except utils.PdfReadError as e:
        print(f"Error reading PDF File {input_file} = {e}")
        return False, None, None
    return True, pdf_reader, pdf_writer

This function performs the following:

  • It validates that the input PDF file is encrypted using PyPDF4 library.
  • It decrypts the pdf_reader object using the password (must be the correct one).
  • It iterates throughout its pages and adds them to a pdf_writer object.

Let's head to level 2, encrypting the actual file:

def cipher_stream(inp_buffer: BytesIO, password: str):
    """Ciphers an input memory buffer and returns a ciphered output memory buffer"""
    # Initialize output ciphered binary stream
    out_buffer = BytesIO()
    inp_buffer.seek(0)
    # Encrypt Stream
    pyAesCrypt.encryptStream(inp_buffer, out_buffer, password, BUFFER_SIZE)
    out_buffer.seek(0)
    return out_buffer

By using the pyAesCrypt library, the above function encrypts an input memory buffer and returns an encrypted memory buffer as output.

Let's make the file decryption function now:

def decipher_file(input_file: str, output_file: str, password: str):
    """
    Deciphers an input file and returns a deciphered output file
    """
    inpFileSize = os.stat(input_file).st_size
    out_buffer = BytesIO()
    with open(input_file, mode='rb') as inp_buffer:
        try:
            # Decrypt Stream
            pyAesCrypt.decryptStream(
                inp_buffer, out_buffer, password, BUFFER_SIZE, inpFileSize)
        except Exception as e:
            print("Exception", str(e))
            return False
        inp_buffer.close()
    if out_buffer:
        with open(output_file, mode='wb') as f:
            f.write(out_buffer.getbuffer())
        f.close()
    return True

In the decipher_file(), we use the decryptStream() method from pyAesCrypt module, which accepts input and output buffer, password, buffer size, and file size as parameters, and writes out the decrypted stream to the output buffer.

For more convenient use of encryption and decryption of files, I suggest you read this tutorial which uses the cryptography module that is more friendly to Python developers.

Now let's combine our functions into a single one:

def encrypt_decrypt_file(**kwargs):
    """Encrypts or decrypts a file"""
    input_file = kwargs.get('input_file')
    password = kwargs.get('password')
    output_file = kwargs.get('output_file')
    action = kwargs.get('action')
    # Protection Level
    # Level 1 --> Encryption / Decryption using PyPDF4
    # Level 2 --> Encryption and Ciphering / Deciphering and Decryption
    level = kwargs.get('level')
    if not output_file:
        output_file = input_file
    if action == "encrypt":
        result, pdf_reader, pdf_writer = encrypt_pdf(
            input_file=input_file, password=password)
        # Encryption completed successfully
        if result:
            output_buffer = BytesIO()
            pdf_writer.write(output_buffer)
            pdf_reader.stream.close()
            if level == 2:
                output_buffer = cipher_stream(output_buffer, password=password)
            with open(output_file, mode='wb') as f:
                f.write(output_buffer.getbuffer())
            f.close()
    elif action == "decrypt":
        if level == 2:
            decipher_file(input_file=input_file,
                          output_file=output_file, password=password)
        result, pdf_reader, pdf_writer = decrypt_pdf(
            input_file=input_file, password=password)
        # Decryption completed successfully
        if result:
            output_buffer = BytesIO()
            pdf_writer.write(output_buffer)
            pdf_reader.stream.close()
            with open(output_file, mode='wb') as f:
                f.write(output_buffer.getbuffer())
            f.close()

The above function accepts 5 keyword arguments:

  • input_file: The input PDF file.
  • output_file: The output PDF file.
  • password: The password string you want to encrypt with.
  • action: Accepts "encrypt" or "decrypt" actions as string.
  • level: Which level of encryption do you want to use. Setting it to 1 means only adding a password during the opening of the PDF file, 2 adds file encryption as another layer of security.

Now, let's create a new class that inherits from argparse.Action to enter a password securely:

class Password(argparse.Action):
    """
    Hides the password entry
    """
    def __call__(self, parser, namespace, values, option_string):
        if values is None:
            values = getpass.getpass()
        setattr(namespace, self.dest, values)

It overrides __call__() method and sets the dest variable of the namespace object to the password that the user enters using the getpass module.

Next, let's define functions for parsing command-line arguments:

def is_valid_path(path):
    """Validates the path inputted and checks whether it is a file path or a folder path"""
    if not path:
        raise ValueError(f"Invalid Path")
    if os.path.isfile(path):
        return path
    elif os.path.isdir(path):
        return path
    else:
        raise ValueError(f"Invalid Path {path}")

def parse_args():
    """Get user command line parameters"""
    parser = argparse.ArgumentParser(description="These options are available")
    parser.add_argument("file", help="Input PDF file you want to encrypt", type=is_valid_path)
    # parser.add_argument('-i', '--input_path', dest='input_path', type=is_valid_path,
    #                     required=True, help="Enter the path of the file or the folder to process")
    parser.add_argument('-a', '--action', dest='action', choices=[
                        'encrypt', 'decrypt'], type=str, default='encrypt', help="Choose whether to encrypt or to decrypt")
    parser.add_argument('-l', '--level', dest='level', choices=[
                        1, 2], type=int, default=1, help="Choose which protection level to apply")
    parser.add_argument('-p', '--password', dest='password', action=Password,
                        nargs='?', type=str, required=True, help="Enter a valid password")
    parser.add_argument('-o', '--output_file', dest='output_file',
                        type=str, help="Enter a valid output file")
    args = vars(parser.parse_args())
    # To Display Command Arguments Except Password
    print("## Command Arguments #################################################")
    print("\n".join("{}:{}".format(i, j)
          for i, j in args.items() if i != 'password'))
    print("######################################################################")
    return args

Finally, writing the main code:

if __name__ == '__main__':
    # Parsing command line arguments entered by user
    args = parse_args()
    # Encrypting or Decrypting File
    encrypt_decrypt_file(
        input_file=args['file'], password=args['password'], 
        action=args['action'], level=args['level'], output_file=args['output_file']
    )

Alright, let's test our program. First, let's pass --help to see the arguments:

$ python encrypt_pdf.py --help

Output:

usage: encrypt_pdf.py [-h] [-a {encrypt,decrypt}] [-l {1,2}] -p [PASSWORD] [-o OUTPUT_FILE] file

These options are available

positional arguments:
  file                  Input PDF file you want to encrypt

optional arguments:
  -h, --help            show this help message and exit
  -a {encrypt,decrypt}, --action {encrypt,decrypt}
                        Choose whether to encrypt or to decrypt
  -l {1,2}, --level {1,2}
                        Choose which protection level to apply
  -p [PASSWORD], --password [PASSWORD]
                        Enter a valid password
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Enter a valid output file

Awesome, let's encrypt an example PDF file (get it here):

$ python encrypt_pdf.py bert-paper.pdf -a encrypt -l 1 -p -o bert-paper-encrypted1.pdf

This will prompt for a password twice:

Password: 
Password:
## Command Arguments #################################################
file:bert-paper.pdf
action:encrypt
level:1
output_file:bert-paper-encrypted1.pdf
######################################################################

A new PDF file that is secured with a password will appear in the current working directory, if you try to open it with any PDF reader program, you'll be prompted by a password, like shown in the below image:

Example encrypted PDF file with Password using Python

Obviously, if you enter a wrong password, you won't be able to access the PDF file.

Next, let's decrypt it now:

$ python encrypt_pdf.py bert-paper-encrypted1.pdf -a decrypt -p -l 1 -o bert-paper-decrypted1.pdf

Output:

Password: 
## Command Arguments #################################################
file:bert-paper-encrypted1.pdf
action:decrypt
level:1
output_file:bert-paper-decrypted1.pdf
######################################################################

Awesome, you'll notice the bert-paper-decrypted1.pdf appear in your directory that is equivalent to the original (not encrypted).

Conclusion

Notice that if you choose level 2, the entire file will be encrypted, and so you need to decrypt it twice, first using level 2 and then level 1.

You need to be aware that locking a PDF file by adding the Document Open Password can be bypassed using a variety of methods, one of which is cracking the PDF password, check this tutorial for how to do it.

You can check the full code of this tutorial here.

Related PDF tutorials:

Happy coding ♥

View Full Code
Sharing is caring!



Read Also




Comment panel