How to Compress and Decompress Files in Python

Learn how to compress and decompress files, folders and symbolic links in Python using gzip compression in tarfile built-in module.
  · 5 min read · Updated may 2022 · Python Standard Library

Step up your coding game with AI-powered Code Explainer. Get insights like never before!

Disclosure: This post may contain affiliate links, meaning when you click the links and make a purchase, we receive a commission.

A compressed file is a sort of archive that contains one or more files that have been reduced in size. Compressing files in modern operating systems is usually pretty simple. However, in this tutorial, you will learn how to compress and decompress files using the Python programming language.

You may ask, why would I learn to compress files in Python where there are already provided tools out there? Well, decompressing files programmatically without any manual clicks is extremely useful. For example, when downloading machine learning datasets in which you want a piece of code to download, extract and load them into memory automatically.

You may also want to add a compression/decompression feature in your application, or you have thousands of compressed files and you want to decompress them in one click, this tutorial can help.

Related: How to Encrypt and Decrypt Files in Python.

Let's get started, we will be using the tarfile built-in module, so we don't have to install anything, you can optionally install tqdm just for printing progress bars:

pip3 install tqdm

Open up a new Python file and:

import tarfile
from tqdm import tqdm # pip3 install tqdm

Compression

Let's start by compression, the following function is responsible for compressing a file/folder or a list of files/folders:

def compress(tar_file, members):
    """
    Adds files (`members`) to a tar_file and compress it
    """
    # open file for gzip compressed writing
    tar = tarfile.open(tar_file, mode="w:gz")
    # with progress bar
    # set the progress bar
    progress = tqdm(members)
    for member in progress:
        # add file/folder/link to the tar file (compress)
        tar.add(member)
        # set the progress description of the progress bar
        progress.set_description(f"Compressing {member}")
    # close the file
    tar.close()

I called these files/folders as members, well that's what the documentation calls them anyway.

First, we opened and created a new tar file for gzip-compressed writing (that's what mode='w:gz' stands for), and then for each member, add it to the archive and then finally close the tar file.

I've optionally wrapped members with tqdm to print progress bars, this will be useful when compressing a lot of files in one go.

That's it for compression, now let's dive into decompression.

Learn also: How to Compress PDF Files in Python.

Decompression

The below function is for decompressing a given archive file:

def decompress(tar_file, path, members=None):
    """
    Extracts `tar_file` and puts the `members` to `path`.
    If members is None, all members on `tar_file` will be extracted.
    """
    tar = tarfile.open(tar_file, mode="r:gz")
    if members is None:
        members = tar.getmembers()
    # with progress bar
    # set the progress bar
    progress = tqdm(members)
    for member in progress:
        tar.extract(member, path=path)
        # set the progress description of the progress bar
        progress.set_description(f"Extracting {member.name}")
    # or use this
    # tar.extractall(members=members, path=path)
    # close the file
    tar.close()

First, we opened the archive file as reading with gzip compression. After that, I made an optional parameter 'member' in case we want to extract specific files (not all archives), if 'members' isn't specified, we gonna get all files in the archive using the getmembers() method which returns all the members of the archive as a Python list.

And then for each member, extract it using the extract() method which extracts a member from the archive to the 'path' directory we specified.

Note that we can alternatively use the extractall() for that (which is preferred in the official documentation).

Let's test this:

compress("compressed.tar.gz", ["test.txt", "folder"])

This will compress the test.txt file and folder in the current directory to a new tar archive file called compressed.tar.gz as shown in the following example figure:

Compressed file using tarfile module in PythonIf you want to decompress:

decompress("compressed.tar.gz", "extracted")

This will decompress the previous archive we just compressed to a new folder called extracted:

Decompressed tar archive file using tarfile module in PythonOkay, we are done! You can be creative with this, here are some ideas:

In this tutorial, we have explored compression and decompression using tarfile module, you can also use zipfile module to work with ZIP archives, bz2 module for bzip2 compressions, gzip, or zlib modules for gzip files.

Resources & Courses

Finally, if you're a beginner and want to learn Python, I suggest you take the Python For Everybody Coursera course, in which you'll learn a lot about Python, good luck!

Learn Also: How to Generate and Read QR Code in Python.

Happy Coding ♥

Just finished the article? Now, boost your next project with our Python Code Generator. Discover a faster, smarter way to code.

View Full Code Build My Python Code
Sharing is caring!



Read Also



Comment panel

    Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!