How to Compress and Decompress Files in Python

Abdou Rockikz · 01 oct 2019

Abdou Rockikz · 4 min read · Updated nov 2019 · General Python Topics

A compressed file is a sort of archive that contains one or more files that have been reduced in size. Compressing files in modern operating systems is usually pretty simple. However, in this tutorial, you will learn how to compress and decompress files using Python!

You may ask, why would I learn to compress files in Python where there are already provided tools out there ? Well, decompressing files programmatically without any manual clicks is extremely useful especially when downloading machine learning datasets in which you want a piece of code to download, extract and load them into memory automatically. You may also want to add a compression feature in your application, who knows!

Let's get started, we will be using tarfile built-in module, so we don't have to install anything, you can optionally install tqdm for progress bars:

pip3 install tqdm

Open up a new Python file and:

import tarfile
from tqdm import tqdm # pip3 install tqdm

Let's start by compression, the below function is responsible for compressing a file/folder or a list of files/folders:

def compress(tar_file, members):
    """
    Adds files (`members`) to a tar_file and compress it
    """
    # open file for gzip compressed writing
    tar = tarfile.open(tar_file, mode="w:gz")
    # with progress bar
    # set the progress bar
    progress = tqdm(members)
    for member in progress:
        # add file/folder/link to the tar file (compress)
        tar.add(member)
        # set the progress description of the progress bar
        progress.set_description(f"Compressing {member}")
    # close the file
    tar.close()

I called these files/folders as members, well that's what the documentation calls them anyway.

First, we opened and created a new tar file for gzip compressed writing (that's what mode='w:gz' stands for), and then for each member, add it to the archive and then finally close the tar file.

I've optionally wrapped members with tqdm to print progress bars, this will be useful when compressing a lot of files in one go.

That's it for compression, now let's dive to decompression.

The below function is for decompressing a given archive file:

def decompress(tar_file, path, members=None):
    """
    Extracts `tar_file` and puts the `members` to `path`.
    If members is None, all members on `tar_file` will be extracted.
    """
    tar = tarfile.open(tar_file, mode="r:gz")
    if members is None:
        members = tar.getmembers()
    # with progress bar
    # set the progress bar
    progress = tqdm(members)
    for member in progress:
        tar.extract(member, path=path)
        # set the progress description of the progress bar
        progress.set_description(f"Extracting {member.name}")
    # or use this
    # tar.extractall(members=members, path=path)
    # close the file
    tar.close()

First, we opened the archive file as reading with gzip compression. After that, I made a optional parameter 'member' in case we want to extract specific files (not all archive), if 'members' isn't specified, we gonna get all files in the archive using getmembers() method which returns all the members of the archive as a Python list.

And then for each member, extract it using extract() method which extracts a member from the archive to the 'path' directory we specified.

Note that we can alternatively use extractall() for that (which is prefered in the official documentation).

Let's test this:

compress("compressed.tar.gz", ["test.txt", "folder"])

This will compress test.txt file and folder in the current directory to a new tar archive file called compressed.tar.gz as shown in the following example figure:

Compressed file using tarfile module in PythonIf you want to decompress:

decompress("compressed.tar.gz", "extracted")

This will decompress the previous archive we just compressed to a new folder called extracted:

Decompressed tar archive file using tarfile module in PythonOkey, we are done! You can be creative with this, here are some ideas:

Happy Coding ♥

View Full Code
Sharing is caring!


Read Also





Comment panel

   
Comment system is still in Beta, if you find any bug, please consider contacting us here.