On gitology

Our analysis of git hatred begins with a discussion of gitology, which I now define:

gitology (n. from git, a comptemptible fellow or DVCS): The pernicious study of the inner workings of git and how to apply them.

Thus gitology refers to the theoretical framework and mental model one must absorb as a prerequisite to becoming fully competent with git’s commonly-used features. Here a distinction must be made, because although there are many gitological concepts that only apply to git, many others are part of general DVCS theory. Gitology, however, sometimes interprets general DVCS concepts in its own way.

The following gitological concepts are not particular to git:

  • repositories (repos)
  • commits or changesets (csets)
  • directed acyclic graph (DAG)
  • branches
  • pushing and pulling changes
  • whole-repo tracking, not individual files
  • rebasing csets
  • pushing and pulling csets

The following are purely gitological and add unnecessary complexity, in addition to eventually being unavoidable:

  • Exposing the index/staging area
  • Exposing other implementation details: blobs, trees, commits, refs
  • refs and refspecs
  • Branches are refs
  • Detached HEADs (a.k.a “not on a branch”)
  • Distinguishing remote and local tracking branches
  • Choosing which branch to pull onto
  • Bare repos
  • Hard, soft, mixed resets
  • Porcelain vs plumbing

I will say more of each in turn further down.

Unavoidable gitology

If you are a user of git, I invite you to try to describe the purely gitological concepts above. If you have used git more than casually, I am sure you probably know what most of them are. If you don’t know most of them, it is likely because you haven’t used git for very long.

It is clear that gitology is unavoidable. A user of git must quickly become a student of gitology for git will make no attempt to hide its ugly guts to said user. Consider for example one of the first tasks that a user of git will encounter, making a commit. This is how git’s manpage describes this operation (and manpage it is, in the classic style of nerd-only Unix documentation):

Stores the current contents of the index in a new commit along with a log message from the user describing the changes.

Immediately the first gitological concept pops up, the index. Now, the index is not unique to git, nor even to a DVCS. Practically every VCS must implement an index in one way or another However, it is an implementation detail, which a VCS might decide to not implement for whatever reason. The great gitological stroke of genius was to gleefully expose and moreover force the user to manually handle the index. Any other common VCS only optionally makes the user handle it. Moreover, this is touted as a frequently-loved feature of git.

Let me pause here for a moment to describe why this is characteristic of git’s pathological design choices.

  1. The index is an implementation detail. Git refers to implementation details as “plumbing” and the user interface as “porcelain”, in order to make it clear that git’s designers think of git with the same reverence I think of the instrument that handles the organic waste that my body produces. Git’s makers (I hesitate to suggest that they actually consciously designed this, so I won’t call them “designers” again) refer to the index as “porcelain”, whereas it should be “plumbing”, as evidenced by how it’s handled by hg, darcs, bzr, and yes, even crufty ol’ svn handles the index automatically.
  2. They used manpages to document them. The manpage is a Unix developer format that tends to lead to terse or obscure documentation. It doesn’t have to do this (e.g. BSD tends to have great manpages), but as the name indicates, it used to be just a single page, really, just a cheat sheet with mnemonics to remind people what they already should have known about a program. The UNIX Haters’ Handbook explores this documentation problem in more detail. In this case, if you didn’t already know what an index was, the manpage is going to make it difficult for you to figure it out. For programs that are intended to be used by nerdy Unix developers only this wouldn’t be a problem; however git is primarily a tool for collaboration, very widespread collaboration, and expecting all collaborators to be nerdy Unix developers is unrealistic and hinders the very collaboration it’s supposed to encourage. They could be Windows users, they could be less technical contributors like translators or graphic artists (yes, it makes sense to put graphics under source control, even a DVCS, with some care). Manpages are inadequate for these people.
  3. Exposing the index is frequently touted. Git encourages micro-management, and git’s users end up loving this micro-management (and blog about it, and write books, and have conferences, and so on ad nauseam). This is characteristic of the perversion that git promotes, focussing on details instead of getting work done. Of course it’s hard work to understand gitology, so of course people feel accomplished once they complete this work, but it’s work that shouldn’t exist. It’s not that it’s wrong to expose an implementation detail if it allows more advanced use. It’s wrong to have no way to hide this detail completely. There are workarounds, like passing options to commit, or using a front-end to git, but remember that I specifically do not hate front-ends to git (poor things are just doing the best they can to fix a horrible UI). It should be the other way around, though. It should be abstracted away from the user, and should the user want more advanced use, there should be a developer API upon which we can build more advanced tools.

Why gitology?

So why does git do the things that it does? Why expose all the plumbing? The situation is basically the following:

While the makers of git are so happy about all the flexibility that git offers due to exposing its plumbing, its users that are not also of the same persuasion are aghast that it requires learning about the path that human refuse follows through git.

The ability of a tool to be so flexible as to allow myself to get shot in the foot never did much for me other than leave me with injured feet.

A brief excursion into gitology

The gitological concepts I enumerated above are all deserving of a more thorough study in forthcoming blog posts. For the moment, I will briefly describe the remaining ones and why it’s not necessary in order to have a working DVCS.

Blobs, trees, commits, refs

Git’s simple storage model is simple enough to be exposed to the user, so why not make everyone learn it? The more internal details we can expose, the better. It is a rite of passage for any serious student of gitology to read Git For Computer Scientists or equivalent.

refs and refspecs

Refs are part of git’s storage model. They’re basically pointers, and just like carelessly manipulating C pointers results in segfaults, carelessly manipulating git’s refs results in lost data. Most of the time it’s not a problem, so naturally they should also be exposed to the user. Refspecs are a general class of ways to specify a ref, and some of their syntax is based on what a certain hacker learned to type when he was inspecting the results when a certain kernel took a core dump on him.

Branches are refs

A branch should be a simple concept independent of any implementation detail: a line of development in the repo’s DAG. For the most part, this is what they are in git, except that they identify branches with refs, so amongst other complications this leads to

Detached HEADs

When you check out an earlier commit in git, you end up with the cryptic “not in a branch” message… then where the hell are you? Isn’t the DAG made up exclusively of branches? No, a branch is a ref, so if you’re not at a commit (recursively) pointed to by a ref, you’re not anywhere and you might as well not exist and git will eventually garbage collect you.

Distinguishing remote and local tracking branches

A DVCS really shouldn’t care where branches are, and for symmetry (because symmetry is beautiful and simple) a branch shouldn’t change its nature if it’s here or there, or at least it shouldn’t appear this way to the user. Git, of course, disagrees and makes you remember the distinction between your local copy of the remote branch and the remote branch. It’s not such a big deal in the end, except when you run into

Choosing which branch to pull onto

In git, you can’t just say, “get over here whatever differences are over there” (i.e. “pull”, a common DVCS operation), because you have to remember where to pull it to. “Here” isn’t always enough to git, because the sad developmentally stunted self-described stupid content tracker can’t know which “here” you mean. You have to specify a branch. Because your branch isn’t the same as the branch over there, so it can’t always automatically go on whatever branch here corresponds to the branch there. This is all symptomatic of the asymmetry of remote and local tracking branches, as are

Bare repos

Because in git you can’t treat remote and local symmetrically (i.e. push and pull are not symmetric operations), you also can’t push (or shouldn’t, rather) push to any repos willy-nilly. So for example, you can’t make local clones in git and push and pull amongst them, because git can’t figure out what to do with the working directory of each. The gitological solution to not knowing what to do with the current working directory is to remove it altogether in what is known as a bare repo. There are workarounds, of course (create a zillion branches, which is what git users love doing), but git’s design choices really hinder the naturality of cloning. There is also deeper design problem lurking here that seeps through the stupidity of the content tracking.

Hard, soft, mixed resets

This one is admittedly more esoteric. There are several ways to do a “reset”, and really they all mean something completely different, but I chose this gitological example to point out another boneheaded aspect of git’s, shall we say, evolution. Gitology teaches that it’s best when a single command does too much and does completely different things depending on which options you choose but actually sort of the same thing if you realise that the underlying implementation is the same. It’s like having to “move” files if you want to “rename” them because that’s how filesystems are implemented, oh, wait, I start to see where these gitologists are getting their ideas from…

Coming up…

So this is just a taste of my git hate, because the fans were clamouring for more. I have so much more to hate about git, but I already feel a little useless having devoted this much breath to git, so I will probably take a while to write the next installment. It will come, though. It doesn’t look like git is getting any less hateful any time soon…