How to Use Github API in Python

Using Github Application Programming Interface v3 to search for repositories, users, making a commit, deleting a file, and more in Python using requests and PyGithub libraries.
  · 7 min read · Updated jul 2020 · Application Programming Interfaces


Github is a Git repository hosting service, in which it adds many of its own features such as web-based graphical interface to manage repositories, access control and several other features, such as wikis, organizations, gists and more.

As you may already know, there is a ton of data to be grabbed. In this tutorial, you will learn how you can use Github API v3 in Python using both requests or PyGithub libraries.

To get started, let's install the dependencies:

pip3 install PyGithub requests

Getting User Data

Since it's pretty straightforward to use Github API v3, you can make a simple GET request to a specific URL and retrieve the results:

import requests
from pprint import pprint

# github username
username = "x4nth055"
# url to request
url = f"https://api.github.com/users/{username}"
# make the request and return the json
user_data = requests.get(url).json()
# pretty print JSON data
pprint(user_data)

Here I used my account, here is a part of the returned JSON (you can see it in the browser as well):

{'avatar_url': 'https://avatars3.githubusercontent.com/u/37851086?v=4',
 'bio': None,
 'blog': 'https://www.thepythoncode.com',
 'company': None,
 'created_at': '2018-03-27T21:49:04Z',
 'email': None,
 'events_url': 'https://api.github.com/users/x4nth055/events{/privacy}',
 'followers': 93,
 'followers_url': 'https://api.github.com/users/x4nth055/followers',
 'following': 41,
 'following_url': 'https://api.github.com/users/x4nth055/following{/other_user}',
 'gists_url': 'https://api.github.com/users/x4nth055/gists{/gist_id}',
 'gravatar_id': '',
 'hireable': True,
 'html_url': 'https://github.com/x4nth055',
 'id': 37851086,
 'login': 'x4nth055',
 'name': 'Rockikz',
<..SNIPPED..>

A lot of data, that's why using requests library alone won't be handy to extract this ton of data manually, as a result, PyGithub comes into the rescue.

Getting Public Repositories of a User

Let's get all the public repositories of that user using PyGithub library we just installed:

import base64
from github import Github
from pprint import pprint

# Github username
username = "x4nth055"
# pygithub object
g = Github()
# get that user by username
user = g.get_user(username)

for repo in user.get_repos():
    print(repo)

Here is my output:

Repository(full_name="x4nth055/aind2-rnn")
Repository(full_name="x4nth055/awesome-algeria")
Repository(full_name="x4nth055/emotion-recognition-using-speech")
Repository(full_name="x4nth055/emotion-recognition-using-text")
Repository(full_name="x4nth055/food-reviews-sentiment-analysis")
Repository(full_name="x4nth055/hrk")
Repository(full_name="x4nth055/lp_simplex")
Repository(full_name="x4nth055/price-prediction")
Repository(full_name="x4nth055/product_recommendation")
Repository(full_name="x4nth055/pythoncode-tutorials")
Repository(full_name="x4nth055/sentiment_analysis_naive_bayes")

Alright, so I made a simple function to extract some useful information from this Repository object:

def print_repo(repo):
    # repository full name
    print("Full name:", repo.full_name)
    # repository description
    print("Description:", repo.description)
    # the date of when the repo was created
    print("Date created:", repo.created_at)
    # the date of the last git push
    print("Date of last push:", repo.pushed_at)
    # home website (if available)
    print("Home Page:", repo.homepage)
    # programming language
    print("Language:", repo.language)
    # number of forks
    print("Number of forks:", repo.forks)
    # number of stars
    print("Number of stars:", repo.stargazers_count)
    print("-"*50)
    # repository content (files & directories)
    print("Contents:")
    for content in repo.get_contents(""):
        print(content)
    try:
        # repo license
        print("License:", base64.b64decode(repo.get_license().content.encode()).decode())
    except:
        pass

Repository object has a lot of other fields, I suggest you use dir(repo) to get the fields you want to print. Let's iterate over repositories again and use the function we just wrote:

# iterate over all public repositories
for repo in user.get_repos():
    print_repo(repo)
    print("="*100)

This will print some information about each public repository of this user:

====================================================================================================
Full name: x4nth055/pythoncode-tutorials
Description: The Python Code Tutorials
Date created: 2019-07-29 12:35:40
Date of last push: 2020-04-02 15:12:38
Home Page: https://www.thepythoncode.com
Language: Python
Number of forks: 154
Number of stars: 150
--------------------------------------------------
Contents:
ContentFile(path="LICENSE")
ContentFile(path="README.md")
ContentFile(path="ethical-hacking")
ContentFile(path="general")
ContentFile(path="images")
ContentFile(path="machine-learning")
ContentFile(path="python-standard-library")
ContentFile(path="scapy")
ContentFile(path="web-scraping")
License: MIT License
<..SNIPPED..>

I've truncated the whole output, as it will return all repositories and their information, you can see we used repo.get_contents("") method to retrieve all the files and folders of that repository, PyGithub parses it into a ContentFile object, use dir(content) to see other useful fields.

Also, if you have private repositories, you can access them by authenticating your account (using the correct credentials) using PyGithub as follows:

username = "username"
password = "password"

# authenticate to github
g = Github(username, password)
# get the authenticated user
user = g.get_user()
for repo in user.get_repos():
    print_repo(repo)

It is also suggested by Github to use the authenticated requests, as it will raise a RateLimitExceededException if you use the public one (without authentication) and exceed a small number of requests.

Searching for Repositories

The Github API is quite rich, you can search for repositories by a specific query just like you do in the website:

# search repositories by name
for repo in g.search_repositories("pythoncode tutorials"):
    # print repository details
    print_repo(repo)

This will return 9 repositories and their information.

You can also search by programming language or topic:

# search by programming language
for i, repo in enumerate(g.search_repositories("language:python")):
    print_repo(repo)
    print("="*100)
    if i == 9:
        break

To search for a particular topic, you simply put something like "topic:machine-learning" in search_repositories() method.

Manipulating Files in your Repository

If you're using the authenticated version, you can also create, update and delete files very easily using the API:

# searching for my repository
repo = g.search_repositories("pythoncode tutorials")[0]

# create a file and commit n push
repo.create_file("test.txt", "commit message", "content of the file")

# delete that created file
contents = repo.get_contents("test.txt")
repo.delete_file(contents.path, "remove test.txt", contents.sha)

The above code is a simple use case, I searched for a particular repository, I've added a new file and called it test.txt, I put some content int it and made a commit. After that, I grabbed the content of that new file and deleted it (and it'll count as a git commit as well).

And sure enough, after the execution of the above lines of code, the commits were created and pushed:

Github CommitsConclusion

We have just scratched the surface in the Github API, there are a lot of other functions and methods you can use and obviously, we can't cover all of them, here are some useful ones you can test them on your own:

  • g.get_organization(login): Returns an Organization object that represent a Github organization
  • g.get_gist(id): Returns a Gist object which it represents a gist in Github
  • g.search_code(query): Returns a paginated list of ContentFile objects in which it represent matched files on several repositories
  • g.search_topics(query): Returns a paginated list of Topic objects in which it represent a Github topic
  • g.search_commits(query): Returns a paginated list of Commit objects in which it represents a commit in Github

There are a lot more, please use dir(g) to get other methods, check PyGithub documentation, or the Github API for detailed information.

Learn also: How to Use Google Custom Search Engine API in Python.

Happy Coding ♥

View Full Code
Sharing is caring!



Read Also





Comment panel