Learn the internals of Git by hacking a website


Learn the internals of Git by hacking a website

PROGRAMMING TUTORIAL

Understanding what one of our favorite tools actually does

Disclaimer: This article is meant for educational purposes only. The author does not condone or encourage hacking, except for permitted white hat hacking.

Most developers that roam this Earth have, at some point or another, come across Git.

Chances are, you probably use it every day.

Now, while it’s easy to reduce Git down to a few memorized commands, like git add, git commit, and git push, there’s actually a lot going on in the background that we don’t often care to understand.

However, understanding a little bit of what happens under the hood could potentially be very useful, and I’ll try to give you some insight into this in a fun way: by teaching you how to exploit a Git-related security vulnerability so you can better secure your own websites.

Remember, you should never try this on anyone’s website without explicit permission. Doing so can constitute a serious crime. I do not take responsibility for anything you choose to do in possession of the knowledge you gain from this article.

A Teeny Tiny Web Server

If you want to skip through file creation and things, feel free to just clone the code from here (using Git :D), and skipping over to this section.

The vulnerability I will discuss here involves having your Git repository or its contents exposed to the web. It affects a whole lot of websites still today and is mostly associated with PHP servers (surprise!), so let’s get ourselves one of those.

No, don’t go away yet! We won’t actually be writing a lot of PHP, but you do need to make sure you have PHP installed in your machine. You can check if it’s already installed by running php -v.

I myself never really used PHP before writing this tutorial, so I assure you this is very simple.

So now, open a terminal window and let’s create a Git directory and add a file to it:

$ mkdir exposed-git

$ cd exposed-git

$ git init

$ echo "<h1>I will be hacked soon!</h1>" > index.php

To run a simple web server, just type:

$ php -S localhost:8000

Now try going over to localhost:8000 in your browser to make sure it works.

Awesome! We have our web server.

What’s Git Without Commit

With our web server ready, let’s add some juicy things to it.

In the root of the directory, do:

$ touch config.php

Then, using your editor of choice, add the following to config.php:

<?php

What we did above is create a mockup of some config file which holds the credentials to the database, and added a “security mechanism” to prevent users from accessing it. Trying to access localhost:8000/config.php will lead us to an error page, which we’ll create below:

$ echo "<h1>You really thought you could hack me?</h1>" > error.php

At this point, you may be thinking:

“Ah, but nobody stores their database credentials in plaintext anyway!”

To which my reply is: Here’s an official WordPress tutorial about the wp-config.php file. Oh yeah, it’s that bad.

By the way, if you store any credentials in plaintext, pretty please at least look into environment variables.

What’s Git Without A Commit

Now that our server files are ready, let’s commit them. Run:

$ git add .

$ git commit -m "first commit"

We’re done with our setup. Now let’s get to hacking.

Do you even hack?

Let’s run our server and try a few things. Again, the command was:

$ php -S localhost:8000

Make sure your server keeps running for the remainder of the tutorial, and do everything else on other terminal windows.

First, try accessing our config.php file at localhost:8000/config.php . You should get the error page we set up earlier.

Now, let’s see if we can access the hidden .git/ directory, since that’s what we’re talking about in this article:

localhost:8000/.git/

I’m running PHP 7.3.9 and that gives me the following page:

That’s nice. PHP is blocking access to directories by default. It’s not always like this, actually. Some websites expose their entire .git directory, and it looks something like this:

Credits: https://pentester.land/tutorials/2018/10/25/source-code-disclosure-via-exposed-git-folder.html

But hey, if we can’t access the full directory, we also can’t access its contents, right?

Well, let’s try. Referencing the picture above, we see some things that every Git directory contains. HEAD looks especially interesting. Could we maybe access that?

Access localhost:8000/.git/HEAD and bingo! We got something.

Doesn’t look super interesting though. But it’s a path — we can try that too?

localhost:8000/.git/refs/heads/master

Oh, we get a hash! What could that be?

Well, let’s think about GitHub.

Above is a screenshot of the GitHub repository for the PHP Interpreter. There’s a banner announcing the latest commit, and that’s also followed by a piece of a hash. Could we have found the hash of a commit?

And since Git stores commits for us, could we find that commit with the hash?

Looking at the directory structure again, one may guess a commit might be stored in .git/objects/ . Sounds like a good guess. Can we get that?

localhost:8000/.git/objects/f452d4085347400afa8751aae3a5184d73113628

Author’s Note

Remember to substitute the hash above for whatever you got from localhost:8000/.git/refs/heads/master .

Not found. So we could quit and call it a day, or try to learn a little more about Git and try again.

Open any directory you have that uses Git and look inside .git/objects/ .

If you picked a directory with a lot of activity, objects/ might look something like this:

A lot of subdirectories named with two-character hexadecimals.

Ah! So the commit we’re looking for isn’t just freely floating in objects/, it’s probably inside a subdirectory named with the first two characters of its hash. That’s worth a shot. Let’s go ahead and request it (with your hash instead of mine):

localhost:8000/.git/objects/f4/52d4085347400afa8751aae3a5184d73113628

For me, this actually prompts a download window. So we must be doing something right. But if I open or cat the file after downloading it, all I get is some gibberish:

xùéÀmB1�9ªäm »ˆ˙+!Dê„⁄fiM`2ÊêÓCBg£©kÔÀã~33ÿúıQíq¡’…≈*&4ùè≠ËZmQ7|ù Y,¢Ò>&#⁄’,Ï–ŸB1JnË5'r,M — c~Ø>È|^·Hß«X`˜ÛG€˛OáØNÀe[◊æ„3b
⁄x¯–®µz⁄Áʉ∑JñqüÍ®_∏⁄K,

But well, Git must know what to do with this. Let’s create a new empty directory to run some tests.

$ mkdir git-tests

$ cd git-tests

$ git init

Let’s then place our file where it belongs, in .git/objects/. From the root of our git-tests dir, do (with your own hash):

$ mkdir .git/objects/f4

Then we can use a built-in Git command to actually read the contents of the file. It looks like this:

$ git cat-file -p f452d4085347400afa8751aae3a5184d73113628

This gives us our first bit of sensitive information:

tree 2764257f81462ae8f9b26ab16d08de153db0cc2b

(Note: If you had a previous commit, you would also see a reference to a “parent”.)

We got an identifier of a “tree” and some info about the commit author and the committer (these differ in the case of pull requests, for example), including their emails! Finally, we got the commit message as well.

Things are starting to get serious…

For reference, you don’t actually need the entire hash to run git cat-file , just a subsection of it will do. For instance, this gives me the same result:

$ git cat-file -p f452d40

Something else you should know is that you can also run the command using the flag -t . This will give you the object type, which can be a tree, blob, commit, and annotated tag. We’re primarily concerned with the first 3, especially blobs (which are files). So, if we get a hash and want to know what type of object it represents, we can simply run:

$ git cat-file -t f452d40
commit

We’ve covered quite some ground, so let’s stop and make note of a few things:

  • Everything Git-related is stored in the .git/ directory
  • HEAD stores a reference to the head of the branch. In our case we only have master
  • Git stores everything in the form of objects, which are stored based on their hash. This is a great design decision because it means that identical objects are stored only once, without unnecessary redundancy.
  • Objects are stored in .git/objects/ inside of a subdirectory named with the first two characters of their hash (e.g. .git/objects/f4 )
  • We can use git cat-file to get the type of an object by its hash with flag -t, as well as its actual content in human-readable format with flag -p

Awesome! But we’re not done yet. Let’s go a little deeper.

Trees and Binary Large Objects (BLOBs)

The commit object we inspected above contained a reference to a tree. Let’s check out what that is by doing the same thing we did above — downloading the object to our local .git/ in git-tests .

Effectively, we are building a mirror of the server’s Git directory so we can see their source code and potentially attack it.

Author’s Note

In a real attack, this would, of course, be done via an automated script. Writing one is quite simple but will not be covered here. We’re interested in how Git works, not Python scripts, so we’ll be doing things manually.

Moving on, the tree object for me was referenced in the commit as follows:

tree 2764257f81462ae8f9b26ab16d08de153db0cc2b

So what I want to do is run the same curl command with this new hash:

$ mkdir .git/objects/27

Then, git cat-file (Some repetition is good, but I won’t repeat these commands anymore after this):

$ git cat-file -p 2764257

Would you look at that!

100644 blob b7ea4906d2ac64060e01b35530a922c65f063831 config.php

We got the directory structure for the server’s root dir!

From this, we can reach an important conclusion: Trees, in Git, are just data structures with references to other Git objects. Thus, directories in Git are represented as trees with references to blobs (files) and other trees (directories).

For instance, if we had a vendor/ subdirectory in our server, it would show up as a tree referenced in this tree we just inspected.

100644 blob b7ea4906d2ac64060e01b35530a922c65f063831 config.php

Also, the numbers on the left are just Git modes. Directories are 040000, because permissions are ignored. 100644 is for non-executables, and 100755 are executables. There are others, but these are the main ones.

Again, let’s go over some important learning points here before we move onto our last step. Here’s a diagram to get things started:

Wait, but that diagram has the arrows going the wrong way!

Actually, while most Git diagrams online will show you arrows going from the first commit to the HEAD , we’ve learned here that commits actually reference their parent, not their child.

Once more, this is a good design decision as well. If commits referenced their children, the second commit in the diagram above would have to keep 2 references instead of one. And actually, it would have to be able to handle having more and more children.

Having it this way, on the other hand, ensures each commit only has 1 reference to another, which is its parent (except for the first one). And any number of commits can in turn reference the same parent, creating branches.

Finally, as we’ve done, to find the HEAD of a branch, we check for it under .git/refs/heads/<branch_name> .

Now, onto some additional points:

  • When you git add files, their objects are created in objects/ , but a commit object is not created
  • When you git commit , references are updated all over the place to reflect the changes in the branch. Most importantly, the previous HEAD (a commit) gets a reference to the new HEAD , and .git/HEAD will therefore point to the new HEAD via .git/refs/heads/<branch_name>
  • Branches, once again, are also just references on top of references. They share a common commit with another branch (such as master), which is why it is easy to inspect changes between branches when merging, for example. All Git needs to do is follow the references to the parent of each commit from the HEAD , until it reaches a the commit where the split happened. Then, it follows the children of that commit on the other branch until it reaches HEAD and compares the trees. This can be done from any branch to any other branch, as they were all eventually a part of master .
  • The point above also means that if you follow the references from the HEAD of any branch downstream, you will eventually reach the first commit in your repository.
  • Commits contain a reference to their parent, but most importantly, a reference to a tree that is a snapshot of the project after the commit. This is why we can actually look into the past with Git, seeing the whole state of our project at any given commit.
  • The point above also exposes a danger with Git, especially when paired with platforms like GitHub or GitLab. If you do not explicitly delete objects and commits in the past, the entire past of the project will be visible by those with access to the repo. If the repo was previously private and then becomes public, its entire past will be available. Hence, if you ever add something you shouldn’t have to a project (like an API key, or worse, a password), a new commit deleting it is not enough. You must actually delete the respective objects and commits. Luckily, Git does have some commands to help you do this so you don’t need to go on running rm on a bunch of weird hash-named files.

Going in for the kill

If you’ve followed this entire tutorial closely, you should now know what comes next, as well as how to do it.

Doing the same steps as above, we can download the config.php file, which we now know the hash for, and inspect its contents.

$ git cat-file -p b7ea4906d2

In our case, we “found” a username and password for a database. This may seem ridiculous to you, but a lot of sites out there have been hacked just like this.

And even if you do not get sensitive information in plaintext, you still have access to the entire source code, meaning there are various other attacks one can perform.

Conclusions

I hope this tutorial has taught you some useful lessons, both about how Git works, and some of the dangers associated with its misuse. While Git is much more than what we went through here, I believe this should be a good start.

I myself am far from a Git expert, but this simple knowledge has significantly upgraded my version control game.

Once more, you can find all the source code I went through here. Please let me know if there are any mistakes or if there’s anything I did not explain properly — I will happily try to fix it.

Finally, to learn more, I would highly recommend you play around in your .git/ directory, and you can look at Git’s source code if you’re really keen. I also recommend the videos below:

And that’s it from me. Thank you!


Source link