How to Download Files from URL in Python

Learn how to use requests and tqdm libraries to build a powerful file downloader with progress bar using Python.
  · 4 min read · Updated sep 2021 · General Python Tutorials


Disclosure: This post may contain affiliate links, meaning when you click the links and make a purchase, we receive a commission.

Downloading files from the Internet is one of the most common daily tasks to perform on the Web. It is important due to the fact that a lot of successful software allows their users to download files from the Internet.

In this tutorial, you will learn how you can download files over HTTP in Python using the requests library.

Related: How to Use Hash Algorithms in Python using hashlib.

Let's get started, installing the required dependencies:

pip3 install requests tqdm

We gonna use the tqdm module here just to print a good-looking progress bar in the downloading process.

Open up a new Python file and import:

from tqdm import tqdm
import requests
import cgi
import sys

We'll be getting the file URL from the command line arguments:

# the url of file you want to download, passed from command line arguments
url = sys.argv[1]

Now the method we gonna use to download content from the web is requests.get(), but the problem is it downloads the file immediately and we don't want that, as it will get stuck on large files and the memory will be filled. Luckily for us, there is an attribute we can set to True, which is stream parameter:

# read 1024 bytes every time 
buffer_size = 1024
# download the body of response by chunk, not immediately
response = requests.get(url, stream=True)

Now only the response headers are downloaded and the connection remains open, hence allowing us to control the workflow by the use of iter_content() method. Before we see it in action, we first need to retrieve the total file size and the file name:

# get the total file size
file_size = int(response.headers.get("Content-Length", 0))
# get the default filename
default_filename = url.split("/")[-1]
# get the content disposition header
content_disposition = response.headers.get("Content-Disposition")
if content_disposition:
    # parse the header using cgi
    value, params = cgi.parse_header(content_disposition)
    # extract filename from content disposition
    filename = params.get("filename", default_filename)
else:
    # if content dispotion is not available, just use default from URL
    filename = default_filename

We get the file size in bytes from Content-Length response header, we also get the file name in Content-Disposition header, but we need to parse it using cgi.parse_header() function.

Let's download the file now:

# progress bar, changing the unit to bytes instead of iteration (default by tqdm)
progress = tqdm(response.iter_content(buffer_size), f"Downloading {filename}", total=file_size, unit="B", unit_scale=True, unit_divisor=1024)
with open(filename, "wb") as f:
    for data in progress.iterable:
        # write data read to the file
        f.write(data)
        # update the progress bar manually
        progress.update(len(data))

iter_content() method iterates over the response data, this avoids reading the content at once into memory for large responses, we specified buffer_size as the number of bytes it should read into memory in every loop.

We then wrapped the iteration with a tqdm object, which will print a fancy progress bar. We also changed the tqdm default unit from iteration to bytes.

After that, in each iteration, we read a chunk of data and write it to the file opened, and update the progress bar.

Here is my result after trying to download a file, you can choose any file you want, just make sure it ends with the file extension (.exe, .pdf, etc.):

C:\file-downloader>python download.py https://download.virtualbox.org/virtualbox/6.1.18/VirtualBox-6.1.18-142142-Win.exe
Downloading VirtualBox-6.1.18-142142-Win.exe:   8%|██▍                             | 7.84M/103M [00:06<01:14, 1.35MB/s]

It is working!

Alright, we are done, as you may see, downloading files in Python is pretty easy using powerful libraries like requests, you can now use this on your Python applications, good luck!

Here are some ideas you can implement:

By the way, if you wish to download torrent files, check this tutorial.

Finally, many of the Python concepts aren't discussed in detail here, if you feel you want to dig more into Python, I highly suggest you get one of these amazing courses:

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also




Comment panel