How to Crack PDF Files in Python

Learn how you can use pikepdf, pdf2john and other tools to crack password protected PDF files in Python.
  · 5 min read · Updated dec 2020 · Ethical Hacking · PDF File Handling · Sponsored

Let us assume that you got a password protected PDF file and it's your top priority job to access it, but unfortunately, you overlooked the password. So, at this stage, you will look for an utmost way that can give you an instant result. In this tutorial, you will learn how to:

To get started, install required dependencies:

pip3 install pikepdf tqdm

Cracking PDF Password using pikepdf

pikepdf is a Python library that allows us to create, manipulate and repair PDF files. It provides a Pythonic wrapper around C++ QPDF library.

We won't be using pikepdf for that though, we just gonna need to open the password protected PDF file, if it succeed, that means it's a correct password, and it'll raise a PasswordError exception otherwise:

import pikepdf
from tqdm import tqdm

# load password list
passwords = [ line.strip() for line in open("wordlist.txt") ]

# iterate over passwords
for password in tqdm(passwords, "Decrypting PDF"):
        # open PDF file
        with"foo-protected.pdf", password=password) as pdf:
            # Password decrypted successfully, break out of the loop
            print("[+] Password found:", password)
    except pikepdf._qpdf.PasswordError as e:
        # wrong password, just continue in the loop

First, we load a password list from wordlist.txt file in the current directory, get it here. You can use rockyou list or any other large wordlists as well. You can also use Crunch tool to generate your own custom wordlist.

Next, we iterate over the list and try to open the file with each password, by passing password argument to method, this will raise pikepdf._qpdf.PasswordError if it's an incorrect password.

We used tqdm here just to print the progress on how much words are remaining, check out my result:

Decrypting PDF:  43%|████████████████████████████████████████▏                                                   | 2137/5000 [00:06<00:08, 320.70it/s]
[+] Password found: abc123

The password was found after 2137 trials, which took about 6 seconds. As you can see, it's going for about 320 word/s, we'll see how to boost this rate.

Cracking PDF Password using John The Ripper

John the Ripper is a free and fast password cracking software tool that is available on many platforms. However, we'll be using Kali linux operating system here, as it already comes pre-installed.

First, we gonna need a way to extract the password hash from the PDF file in order to be suitable for cracking in john utility. Luckily for us, there is a Python script that does that, let's download it:

Downloading pdf2john.pyPut your password protected PDF in the current directory, mine is called foo-protected.pdf, and run the following command:

[email protected]:~/pdf-cracking# python3 foo-protected.pdf | sed "s/::.*$//" | sed "s/^.*://" | sed -r 's/^.{2}//' | sed 's/.\{1\}$//' > hash

This will extract PDF password hash into a new file named hash, here is my result:

Extracting PDF password hash using pdf2johnAfter I saved the password hash into hash file, I used cat command to print it to the screen.

Finally, we use this hash file to crack the password:

Password cracked successfully using john the ripperWe simply use the command "john [hashfile]". As you can see, the password is 012345 and was found with the speed of 4503p/s.

Related: How to Use Hashing Algorithms in Python using hashlib.

Cracking PDF Password using iSeePassword Dr.PDF

Not all users are comfortable with coding in Python or using commands in Linux. So, if you're looking for an effective PDF password cracking program on Windows, then iSeePassword Dr.PDF is one of the best choice.

Importing PDF file

This PDF password cracking has an easy to understand UI so even the novices know how to use this program. Besides, it offers three powerful password cracking algorithms, including Dictionary, Brute-force and Brute-force with Mask. You're free to set several types of parameters to boost the performance.

Password found image

Currently, the password cracking speed is up to 100K per second, making it one of the fastest programs for cracking PDF password.


So that's it, our job is done and we have successfully cracked the PDF password using three methods: pikepdf, John The Ripper and iSeePassword Dr.PDF. The first method takes a lot of time to break the password but quite intuitive for Python programmers, whereas the other methods are the ultimate method to get the password of a PDF file in a short period of time. Make sure you use this for ethical and own use.

Learn also: How to Brute Force ZIP File Passwords in Python.

Happy Cracking ♥

View Full Code
Sharing is caring!

Read Also

Comment panel