How to Extract YouTube Data in Python

Abdou Rockikz · 15 sep 2019

Abdou Rockikz · 6 min read · Updated oct 2019 · Web Scraping

Web scraping is extracting data from websites. It is a form of copying, in which specific data is gathered and copied from the web into a central local database or spreadsheet for later analysis or retrieval.

Since YouTube is the biggest video sharing website in the internet, extracting data from it can be very helpful, you can find the most popular channels, keeping track on the popularity of channels, recording likes, dislikes and views on videos and much more. In this tutorial, you will learn how to extract data from YouTube videos using requests and BeautifulSoup in Python.

Installing required dependencies:

pip3 install requests bs4

Before we dive into the quick script, we gonna need to experiment on how to extract such data from websites using BeautifulSoup, open up a Python interactive shell and write this lines of code:

import requests
from bs4 import BeautifulSoup as bs # importing BeautifulSoup

# sample youtube video url
video_url = "https://www.youtube.com/watch?v=jNQXAC9IVRw"
# get the html content
content = requests.get(video_url)
# create bs object to parse HTML
soup = bs(content.content, "html.parser")
# write all HTML code into a file
open("video.html", "w", encoding='utf8').write(content.text)

This will create a new HTML file in the current directory, open it up on a browser and see how BeautifulSoup will see the YouTube video web page.

When you scroll a little bit down in the web page, you will see the number of views of the video, right click and click Inspect (atleast in Chrome) as shown in the following figure:

Inspect Element to get YouTube views

You will see the HTML tag element which contains that information:

YouTube views HTML tag element

As you can see, the number of video views is wrapped in a div with a class of "watch-view-count". This is trivial to extract in BeautifulSoup:

In [17]: soup.find("div", attrs={'class': 'watch-view-count'}).text
Out[17]: '9,072 views'

This way, you will be able to extract everything you want from a web page. Now let's make our script, open up a new python file and follow along:

Importing necessary modules:

import requests
from bs4 import BeautifulSoup as bs

Let's make a function, given an URL of a YouTube video, it will return all the data in a dictionary:

def get_video_info(url):
    # download HTML code
    content = requests.get(url)
    # create beautiful soup object to parse HTML
    soup = bs(content.content, "html.parser")
    # initialize the result
    result = {}

Retrieving the video title:

    # video title
    result['title'] = soup.find("span", attrs={"class": "watch-title"}).text.strip()

The video title is in a span tag with the attribute class of "watch-title", the above line extracts it.

Number of views converted to an integer:

    # video views (converted to integer)
    result['views'] = int(soup.find("div", attrs={"class": "watch-view-count"}).text[:-6].replace(",", ""))

Get the video description:

    # video description
    result['description'] = soup.find("p", attrs={"id": "eow-description"}).text

The date when the video was published:

    # date published
    result['date_published'] = soup.find("strong", attrs={"class": "watch-time-text"}).text

The number of likes and dislikes as integers:

    # number of likes as integer
    result['likes'] = int(soup.find("button", attrs={"title": "I like this"}).text.replace(",", ""))
    # number of dislikes as integer
    result['dislikes'] = int(soup.find("button", attrs={"title": "I dislike this"}).text.replace(",", ""))

Since in a YouTube video, you can see the channel details, such as the name, and number of subscribers, let's grab that as well:

    # channel details
    channel_tag = soup.find("div", attrs={"class": "yt-user-info"}).find("a")
    # channel name
    channel_name = channel_tag.text
    # channel URL
    channel_url = f"https://www.youtube.com{channel_tag['href']}"
    # number of subscribers as str
    channel_subscribers = soup.find("span", attrs={"class": "yt-subscriber-count"}).text.strip()
    result['channel'] = {'name': channel_name, 'url': channel_url, 'subscribers': channel_subscribers}
    # return the result
    return result

Since soup.find() function returns a Tag object, you can still find HTML tags within other tags. As a result, It is a common practice to call find() more than once.

Now, let's finish up our script:

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="YouTube Video Data Extractor")
    parser.add_argument("url", help="URL of the YouTube video")
    args = parser.parse_args()
    url = args.url
    # get the data
    data = get_video_info(url)
    # print in nice format
    print(f"Title: {data['title']}")
    print(f"Views: {data['views']}")
    print(f"\nDescription: {data['description']}\n")
    print(data['date_published'])
    print(f"Likes: {data['likes']}")
    print(f"Dislikes: {data['dislikes']}")
    print(f"\nChannel Name: {data['channel']['name']}")
    print(f"Channel URL: {data['channel']['url']}")
    print(f"Channel Subscribers: {data['channel']['subscribers']}")

Nothing special here, since we need a way to retrieve the video URL from the command line, the above does just that, and then print it in a format, here is my output when running the script:

C:\youtube-extractor>python extract_video_info.py https://www.youtube.com/watch?v=jNQXAC9IVRw
Title: Me at the zoo
Views: 75909913

Description: The first video on YouTube. Maybe it's time to go back to the zoo?sub2sub kthxbai -- fast and loyal if not i get a subs back i will unsubs your cahnnel(Credit: The name of the music playing in the background is Darude - Sandstorm)

Published on Apr 23, 2005
Likes: 2337823
Dislikes: 81210

Channel Name: jawed
Channel URL: https://www.youtube.com/channel/UC4QobU6STFB0P71PMvOGN5A
Channel Subscribers: 616K

This is it! Now you can not only extract YouTube video details, you can apply this skill to any website you want. If you want to extract Wikipedia pages, there is a tutorial for that ! Or maybe you want to scrape weather data from Google? There is a tutorial for that as well.

Happy Scraping ♥

View Full Code
Sharing is caring!


Read Also





Comment panel

   
Comment system is still in Beta, if you find any bug, please consider contacting us here.