rant – The Scribbler of the Rueful Countenance

Make this Bitbucket free

March 11, 2012 Jordi

This is something I wrote a while ago on a Bitbucket bug post. Since then, I’ve close my Bitbucket account, and I still haven’t found satisfactory free hg hosting. I have this, but it feels very makeshift and temporary. I don’t care too much about all the bells and whistles like bug tracker and wiki. I never used that. I just want to have some hosting, and I want someone else to take maintain it, for a fee, if necessary.

I’m reproducing this because Bitbucket might some day decide to take down my request which is currently public, because free software needs free tools and I still want a free commercial hg hosting. If you know of someone, please let me know.

(Reported by jordigh on 2011-09-27)

I hope you hear me out. I hope someone with some decision power hears me out.

Right now, Bitbucket is not free. When Bitbucket has functionality that I dislike, I am not free to implement suggestions for you. At best, I can beg and argue with you when you decide that my suggestion is not a good one. If Bitbucket were free, I could go and implement it and show you, “see, this is how it could work, what do you think?” I could have much more leverage with which to help Bitbucket be better if it were free.

I was told that Bitbucket is not free because it is commercial. I do not understand why this is an explanation. Red Hat is free and commercial. Some of the best money I’ve ever spent. Gitorious is free and commercial despite more directly competing with a behemoth like github. From what we know of their finances, both of these are successful commercial ventures. And they are free.

I am not asking you to give up your trademark, your savoir-faire, your brand image, your hosting, your servers. Ultimately, all of this is what you really sell, just like Red Hat sells support and Gitorious sells support and hosting. I only see you adding value to Bitbucket if you make it free.

Hell, I will pay for access to the source. As much as 50, no, 100 USD per month for a premium source access subscription. Make it so that only premium subscribers have read access to the master Bitbucket repository. I will pay for the privilege to collaborate with you instead of begging you to make changes and arguing with you when you decide WONTFIX, HOLD, LATER, NOTABUG, CANTREPRODUCE.

Make it AGPL. If I or anyone else decides to run off with the source and make a knockoff site (which, btw, I haven’t seen happen with gitorious yet, perhaps you have?), your source won’t give us a competitive advantage, because we won’t have the service, hosting, trademark, brand name, and support you actually sell. Give me the opportunity to buy another product from you.

As it is stands right now, I am not happy with Bitbucket. I think it has potential, but I can’t fix it when it doesn’t work. Every user of this site is a developer. Every user is a potential collaborator, moreso than any other kind of software. We hackers also like to hack with our and on our tools. We are already writing code. We all give you our source. You don’t give us yours. We gave you Mercurial, Python, Pygments, Django. Please give us Bitbucket.

Collaborate with us. Share source like we share it with you. Make it free.

Charles McLaughlin

written 2011-09-27

* Changed status from new to wontfix.

Hi Jordi,

Thanks for your suggestion and the offer to hack on our code. This has come up before, for instance in issue #1566. You make some great points, but we still don’t have any immediate plans to open source the entire code base. For starters, we link against Mercurial, so if we distribute the code it would become GPL. If we wanted to sell a “behind the firewall” version of Bitbucket with a different license we would need to rewrite a significant amount of code and possibly shell out to hg to avoid linking. Since Atlassian bought Bitbucket a year ago we’ve added hundreds of improvements, stabilized and improved performance, and scaled the infrastructure to handle huge increases in users and repositories. That’s were our focus is, but we are committed to the open source community.

While this doesn’t satisfy your request, as we develop new features we’re open sourcing as much as possible. Here’s an example:

Tracking Slow Requests with Dogslow

And there’s more to come.

Regards,

Charles

(written 2011-10-03 on by jordigh)

First of all, thanks for at least giving me a complete answer instead of simply dismissing me as a lunatic.

A few scraps of free software unrelated to the functionality I actually care to influence or modify don’t serve my needs, you are correct. I find scraps to not be a strong commitment to free software. I want the whole thing. Paying for it, like I said.

As to linking to the GPLed hg code, I suggested using the AGPL, which as you know, is much stronger copyleft compatible with the GPL. That way, you don’t have to rewrite anything (and it’s arguable if you can avoid copyleft by using a different interface), and if someone forks off your code, the source alone won’t give them an unfair advantage since they can’t add modifications to it and use it publicly unless they share them back to you.

I really think the real value is the hosting and the maintenance. And being able to submit patches. I see you as selling your services. If I could have those things, I would be happiest with bitbucket. I have my own hg hosting, but it’s a pain to maintain, and I would be happy if I could pay someone else to do it.

As for influencing your direction of development, I’m very disappointed to have received an email today about things you have implemented, none of which I want or need, and at least one that alarms me (git…?).

Furthermore, I’m hosting some GNU development on bb. I can no longer do that in good conscience. Since you don’t offer what I want, and I can’t influence what you are doing with the website except with ineffective pleading and arguing, I will be working this week on moving my work off bitbucket, closing my account, and maintaining my own hosting elsewhere. I’m sorry it has come to this.

Good luck with that git.

On gitology

December 18, 2011February 24, 2012 Jordi

Our analysis of git hatred begins with a discussion of gitology, which I now define:

gitology (n. from git, a comptemptible fellow or DVCS): The pernicious study of the inner workings of git and how to apply them.

Thus gitology refers to the theoretical framework and mental model one must absorb as a prerequisite to becoming fully competent with git’s commonly-used features. Here a distinction must be made, because although there are many gitological concepts that only apply to git, many others are part of general DVCS theory. Gitology, however, sometimes interprets general DVCS concepts in its own way.

The following gitological concepts are not particular to git:

repositories (repos)
commits or changesets (csets)
directed acyclic graph (DAG)
branches
pushing and pulling changes
whole-repo tracking, not individual files
rebasing csets
pushing and pulling csets

The following are purely gitological and add unnecessary complexity, in addition to eventually being unavoidable:

Exposing the index/staging area
Exposing other implementation details: blobs, trees, commits, refs
refs and refspecs
Branches are refs
Detached HEADs (a.k.a “not on a branch”)
Distinguishing remote and local tracking branches
Choosing which branch to pull onto
Bare repos
Hard, soft, mixed resets
Porcelain vs plumbing

I will say more of each in turn further down.

Unavoidable gitology

If you are a user of git, I invite you to try to describe the purely gitological concepts above. If you have used git more than casually, I am sure you probably know what most of them are. If you don’t know most of them, it is likely because you haven’t used git for very long.

It is clear that gitology is unavoidable. A user of git must quickly become a student of gitology for git will make no attempt to hide its ugly guts to said user. Consider for example one of the first tasks that a user of git will encounter, making a commit. This is how git’s manpage describes this operation (and manpage it is, in the classic style of nerd-only Unix documentation):

Stores the current contents of the index in a new commit along with a log message from the user describing the changes.

Immediately the first gitological concept pops up, the index. Now, the index is not unique to git, nor even to a DVCS. Practically every VCS must implement an index in one way or another However, it is an implementation detail, which a VCS might decide to not implement for whatever reason. The great gitological stroke of genius was to gleefully expose and moreover force the user to manually handle the index. Any other common VCS only optionally makes the user handle it. Moreover, this is touted as a frequently-loved feature of git.

Let me pause here for a moment to describe why this is characteristic of git’s pathological design choices.

The index is an implementation detail. Git refers to implementation details as “plumbing” and the user interface as “porcelain”, in order to make it clear that git’s designers think of git with the same reverence I think of the instrument that handles the organic waste that my body produces. Git’s makers (I hesitate to suggest that they actually consciously designed this, so I won’t call them “designers” again) refer to the index as “porcelain”, whereas it should be “plumbing”, as evidenced by how it’s handled by hg, darcs, bzr, and yes, even crufty ol’ svn handles the index automatically.
They used manpages to document them. The manpage is a Unix developer format that tends to lead to terse or obscure documentation. It doesn’t have to do this (e.g. BSD tends to have great manpages), but as the name indicates, it used to be just a single page, really, just a cheat sheet with mnemonics to remind people what they already should have known about a program. The UNIX Haters’ Handbook explores this documentation problem in more detail. In this case, if you didn’t already know what an index was, the manpage is going to make it difficult for you to figure it out. For programs that are intended to be used by nerdy Unix developers only this wouldn’t be a problem; however git is primarily a tool for collaboration, very widespread collaboration, and expecting all collaborators to be nerdy Unix developers is unrealistic and hinders the very collaboration it’s supposed to encourage. They could be Windows users, they could be less technical contributors like translators or graphic artists (yes, it makes sense to put graphics under source control, even a DVCS, with some care). Manpages are inadequate for these people.
Exposing the index is frequently touted. Git encourages micro-management, and git’s users end up loving this micro-management (and blog about it, and write books, and have conferences, and so on ad nauseam). This is characteristic of the perversion that git promotes, focussing on details instead of getting work done. Of course it’s hard work to understand gitology, so of course people feel accomplished once they complete this work, but it’s work that shouldn’t exist. It’s not that it’s wrong to expose an implementation detail if it allows more advanced use. It’s wrong to have no way to hide this detail completely. There are workarounds, like passing options to commit, or using a front-end to git, but remember that I specifically do not hate front-ends to git (poor things are just doing the best they can to fix a horrible UI). It should be the other way around, though. It should be abstracted away from the user, and should the user want more advanced use, there should be a developer API upon which we can build more advanced tools.

Why gitology?

So why does git do the things that it does? Why expose all the plumbing? The situation is basically the following:

While the makers of git are so happy about all the flexibility that git offers due to exposing its plumbing, its users that are not also of the same persuasion are aghast that it requires learning about the path that human refuse follows through git.

The ability of a tool to be so flexible as to allow myself to get shot in the foot never did much for me other than leave me with injured feet.

A brief excursion into gitology

The gitological concepts I enumerated above are all deserving of a more thorough study in forthcoming blog posts. For the moment, I will briefly describe the remaining ones and why it’s not necessary in order to have a working DVCS.

Blobs, trees, commits, refs

Git’s simple storage model is simple enough to be exposed to the user, so why not make everyone learn it? The more internal details we can expose, the better. It is a rite of passage for any serious student of gitology to read Git For Computer Scientists or equivalent.

refs and refspecs

Refs are part of git’s storage model. They’re basically pointers, and just like carelessly manipulating C pointers results in segfaults, carelessly manipulating git’s refs results in lost data. Most of the time it’s not a problem, so naturally they should also be exposed to the user. Refspecs are a general class of ways to specify a ref, and some of their syntax is based on what a certain hacker learned to type when he was inspecting the results when a certain kernel took a core dump on him.

Branches are refs

A branch should be a simple concept independent of any implementation detail: a line of development in the repo’s DAG. For the most part, this is what they are in git, except that they identify branches with refs, so amongst other complications this leads to

Detached HEADs

When you check out an earlier commit in git, you end up with the cryptic “not in a branch” message… then where the hell are you? Isn’t the DAG made up exclusively of branches? No, a branch is a ref, so if you’re not at a commit (recursively) pointed to by a ref, you’re not anywhere and you might as well not exist and git will eventually garbage collect you.

Distinguishing remote and local tracking branches

A DVCS really shouldn’t care where branches are, and for symmetry (because symmetry is beautiful and simple) a branch shouldn’t change its nature if it’s here or there, or at least it shouldn’t appear this way to the user. Git, of course, disagrees and makes you remember the distinction between your local copy of the remote branch and the remote branch. It’s not such a big deal in the end, except when you run into

Choosing which branch to pull onto

In git, you can’t just say, “get over here whatever differences are over there” (i.e. “pull”, a common DVCS operation), because you have to remember where to pull it to. “Here” isn’t always enough to git, because the sad developmentally stunted self-described stupid content tracker can’t know which “here” you mean. You have to specify a branch. Because your branch isn’t the same as the branch over there, so it can’t always automatically go on whatever branch here corresponds to the branch there. This is all symptomatic of the asymmetry of remote and local tracking branches, as are

Bare repos

Because in git you can’t treat remote and local symmetrically (i.e. push and pull are not symmetric operations), you also can’t push (or shouldn’t, rather) push to any repos willy-nilly. So for example, you can’t make local clones in git and push and pull amongst them, because git can’t figure out what to do with the working directory of each. The gitological solution to not knowing what to do with the current working directory is to remove it altogether in what is known as a bare repo. There are workarounds, of course (create a zillion branches, which is what git users love doing), but git’s design choices really hinder the naturality of cloning. There is also deeper design problem lurking here that seeps through the stupidity of the content tracking.

Hard, soft, mixed resets

This one is admittedly more esoteric. There are several ways to do a “reset”, and really they all mean something completely different, but I chose this gitological example to point out another boneheaded aspect of git’s, shall we say, evolution. Gitology teaches that it’s best when a single command does too much and does completely different things depending on which options you choose but actually sort of the same thing if you realise that the underlying implementation is the same. It’s like having to “move” files if you want to “rename” them because that’s how filesystems are implemented, oh, wait, I start to see where these gitologists are getting their ideas from…

Coming up…

So this is just a taste of my git hate, because the fans were clamouring for more. I have so much more to hate about git, but I already feel a little useless having devoted this much breath to git, so I will probably take a while to write the next installment. It will come, though. It doesn’t look like git is getting any less hateful any time soon…

Git… enough!

September 25, 2011May 15, 2012 Jordi

When in the course of human events it becomes necessary for a software user to disparage a thoroughly hostile DVCS, there is no recourse but to blog about it. Thus, software diplomacy has failed, and we must face the fact that I irredeemably hate git. To prove this, let these facts be submitted to a candid world.

git has destroyed our data.
git has destroyed our data remotely.
git has forced us to learn gitology.
git has coerced us to memorise --many --unrelated --options. For suposedly the same command.
git has made us use a thoroughly unintuitive user interface with badly named commands.
git has produced horrible documentation.
git has written and suggested we read Git for Computer Scientists. Several times.
git has created the largest unavoidable worldwide gang of users, and we must use git if we wish to collaborate with them.
git has obscured safer, easier, and just as powerful alternatives.

We, therefore, appealing to the wellbeing of all software users, do in the name and by the authority of all distributed version control system decency, solemnly publish and declare that git really fucking sucks, that the previous obscene intensifier was fully necessary, and that we should do all that is within our power to reduce and minimise the proliferation of git usage.

Why I must write this

Before I proceed with my study of git hate, I must first present an apologia for why I am writing this.

You may tell me, “fine, you hate git, don’t use it. It’s just your personal preference.” Normally, for any other software I use out of personal preference, I would agree. I am a student of the Church of Emacs, but this does not mean I think everyone should use Emacs. As long as you produce text, I don’t care how you do it. You can use your text editor of choice, I can use mine, and we will both be happy doing so.

Unfortunately, the same does not hold for git. Although git can be used in isolation without ever collaborating with anyone else (after all, it is a distributed version control system, so this makes it easy to use without other people or a remote server), this is not its primary use case. If everyone around me is using git, then I am too coerced to learn git if I want to collaborate with their software development. You may argue that I can use another DVCS because everything can convert back and forth to git, but interoperability can only go so far. Eventually someone uses a feature of git that another DVCS doesn’t implement, and I will have to use git anyway.

Moreover, it is clear that git is creating a community of people who are faffing around learning gitology and feeling good about themselves for understanding abstruse concepts which are completely orthogonal to actually getting work done. This is evidenced by all the blog posts written by people being frustrated with git. As the renowned 20th century mathematician G. H. Hardy once wrote, “it is a melancholy experience for a professional mathematician to find himself writing about mathematics. The function of a mathematician, is to do something, to prove new theorems, to add to mathematics, and not to talk about what he or other mathematicians have done.” In a similar vein, it’s a melancholy experience that we spend so much time blogging about how to use git, reading blogs about how to use git, and joke about using git, instead of getting work done. Without git.

Because everyone else is wasting their time praising and discussing git, the aptly-named stupid content tracker, I must now waste my own time to point out how blockheaded it is that we spend so much time learning the tool that should be tangential to our work and ultimately unrelated to our actual work.

What will not be hated on

I want to make it perfectly clear that I will direct my hatred at git and at git only. In particular, none of the following will receive any direct dosage of hatred:

github
magit
git Tower
gitg
TortoiseGit
Your Favourite non-commandline Interface to Git
Whatever other tool for tracking content that happens to use git as a database or whatever

I want to make this clear, because these tools above are the reason many people claim to love git. But just because good tools have been built on a horrible core doesn’t mean the core is good. As an example I will frequently come back to, C++ is a horrible language, but many good tools and frameworks have been built on top of it.

To all the hardcore git lovers out there, git is neither necessary nor sufficient for building all of the tools you probably love that have been built on top of git. The facilities git provides to build these tools on top of it could have been provided by any other DVCS. We can and should do better than git without sacrificing any of the things you love about git, other than whatever tool has been built on top of it.

Furthermore, I also want to make this very clear: the ultimate enemy is centralised version control, not git. CVCSes are what’s really slowing our collaboration down. git is only the major enemy within the DVCS camp, but if you’re forced to choose git over any CVCS, and git is really the only choice for a DVCS for you (this is rarely true, but I’m speaking hypothetically here), then, yes, choose git. It is by far the lesser of two evils in this hypothetical Catch-22.

What will be offered instead

I will frequently cite Mercurial, abbreviated hg, as an alternative to git. I contend that Mercurial is a safer, saner, better designed, and no less powerful alternative to git. While it may be true that Bitbucket, the major non-free provider of Mercurial hosting often compared to Github, may not fulfill the requirements you have come to expect from Github, this is not due to any limitation of hg itself. As I said, I want to only argue that git as a building block is rotten, but this is not directly related to how people have managed to polish this particular turd.

I do not, however, want to be taken as simply a an admirer of hg who is used to an hg workflow and thus hates git simply because git is not hg. I will make a thorough effort to present sound arguments for the myriad reasons why git sucks that will stand on their own without comparing them to hg. When I am finished expounding my revulsion for git, I will offer alternatives that hg or perhaps another DVCS offers that demonstrate that git’s ugliness is not necessary.

Necessary. This is also a word I will come back to often. A central theme of my attack is that is that all the complications that git has created are not necessary. In order to demonstrate this, I need to offer examples of where git’s equivalent functionality exists without the parts of git that are not necessary. This is why I will frequently cite hg.

So, enough with the meta. Let’s get on to the hating itself.

I hate git

January 3, 2011September 26, 2011 Jordi

… or at least I hate it for now.

I’m a Mercurial kinda guy (hereafter hg). Mercurial is the version control system (VCS) that Octave uses, so that’s mostly the reason why I started using it too. I started reading about it, and learning it, and liking it a lot. It makes a lot of sense to me. It’s simple when it needs to be simple and flexible when it needs to be complex.

The other big contender for a VCS is usually git. In fact, it’s quite a large contender. Just going by comparing github and bitbucket, the two large commercial hosts for git and hg respectively (don’t let that .org domain fool you, bitbucket is definitely a commercial venture), github is way larger. It is by far easier to find people praising git in the blogs, the discussion forums, and the mailing lists than it is to find people cheering for hg. I have dabbled with git in the past, and I always found it difficult to understand. I always chalked up this difficulty to just being more familiar with hg, and being nothing more than a personal preference. However, I have recently seen that I am not alone in thinking that git is complicated. Regardless, seeing how immensely popular, vastly more popular than hg it is, I decided to try git again today.

I decided to make a conscious effort again today to learn and use git. I had a practical reason too, to fix a Debian RC bug (perhaps a little late, I hope the release managers let the package back into testing after this). Also, I wanted to streamline the flow for hacking on Debian packages. The Debian packaging of Octave used to be under svn, and later turned into git packages. One thing that makes a lot of sense under svn, since it tracks individual directories, not whole projects, is to only put the debian/ directory under source control. This practice got inherited with the git repositories, and it’s really awkward. Setting yourself up to hack a Debian Octave package goes something like this:

apt-get source octave-foo
rm -r octave-foo-$version/debian
cd octave-foo
git clone git+ssh://git.debian.org/git/pkg-octave/octave-foo.git

# ...
# hack hack hack
# ...

dpkg-buildpackage

which is really awkward. My goal was to get to this:

git clone git+ssh://git.debian.org/git/pkg-octave/octave-foo.git

# ...
# hack hack hack
# ...

git-buildpackage

So I set out to do that. With somewhat unfortunate results.

Let me talk a little more about where I’m coming from: hg. In hg, there are some guiding philosophical principles that have become second nature to me when working with source. One of the core hg principles is that it’s really hard to destroy data, in particular history. There are certain destructive operations with hg, but they almost all create backups, and are disabled by default. The way to enable them is to turn on extensions. In particular, hg makes it virtually impossible to destroy any data remotely unless the person who controls that remote repository somehow enables it with hooks. That is, the person would have to write a script that when you manipulate their repository remotely (and the only commands to do this are pull and push), that script would delete some data.

This vibes really well with me. One of the things that git users praise the most is how easy it is to edit history, to undo mistakes, to rebase changes… hg doesn’t make these tasks impossible, merely difficult or disabled by default; and I tend to side with that point of view. It’s safer. Mercurial takes care of my data when I want it to, and when I need it to do dangerous things, I first have to remove its muzzle, and the muzzle snaps back into place when hg is done doing its dastardly deed. It’s a bit like using “sudo” to perform just one dangerous operation instead of “su”, and then staying in the root shell, while performing several operations, none of which really needs the extra permissions.

So on to what happened: during my work of trying to make it easier to work with Debian, I had created several git branches (which are nothing like hg branches, but whatever, that’s not a big deal). When it looked like my work was in good shape, I pushed it to the Debian git repo. Oh, oops, that only pushes one branch. That’s quite unlike hg which pushes all of the work here that doesn’t exist there. Well, not a big deal, that’s a bit like git’s staging area I thought. Just one more step to get what I want. But I had like three different branches here that weren’t there, so I figured there must be some command to get them all there at once. I asked around in IRC, and someone naïvely suggested using the --mirror command, and I naïvely trusted them without checking what that option would do. I thought it would just copy all of my branches from here to there, mirroring all of them.

And so it did. However, it also checked that there were some branches there that I didn’t have in my local clone, and it erased them. I blinked. Wait. Did git just remotely remove some branches? Oh, well, I’m sure it’s just some metadata that got moved around. Where’s the undo button? Rollback? Restore? I went to #git in IRC to ask.

“… you do have backups, don’t you?”

I blinked again.

Whiskey.

Tango.

Foxtrot.

You’re telling me… that a VCS… one of the most popular ones out there… allows me to delete data remotely? With a command that isn’t called even called “delete” or “wipe” or “force” but innocuously called “mirror”?

I was aghast.

My conversational partners in #git gave me the usual spiel about backups, about how it’s great to be able to shoot yourself in the foot, about how it was my fault, about how I should have read the manual… but I was unable to accept any of this. I just couldn’t conceive that a tool that is supposed to keep my history … to be a little “bit-hoarder” … to never lose data … not only lets me lose data locally, which is ok, but furthermore lets me delete data remotely.

Now, granted, this wasn’t terribly important data. Nothing of great value was lost. Since branches in git are more like tags (but not what git calls tags), it’s just metadata that was lost. The functional part is all there. At the same time, a user’s most valuable possession, data, was harmed by the very tool that’s supposed to protect it. I hate the idea of having to tiptoe around my VCS, which should be a tool that lets me experiment wildly with my source, to try out crazy ideas, and at the same time keep my source safe, multiply backed up, fully mirrored in every clone of the source whether local or remote. Mercurial, for example, doesn’t let me delete data remotely. The worst I can do is add a lot of useless data remotely, but that’s much better than being able to delete it.

Reeling, I did the only thing that could be done and emailed the Debian Octave Group mailing list, asking if someone had a clone of the repository with the missing branches. I hoped that I could recover the lost data by copying it from them. If not, it won’t be a great loss, just an awkward inconvenience. The whole experience, though, has given me a great distaste for git. I still find it much more complicated than hg, even despite my best attempts to understand it. And it’s shown me that I can’t treat it carelessly, that I have to read its gargantuan manpages and thoroughly understand each and every command and option before I use them, lest I provoke damage.

Next time, I’m using the hg-git extension, and I don’t think I’ll be touching git again for a while until I recover from this nasty experience.