Skip to content

I hate git

January 3, 2011
By Jordi in personal

… or at least I hate it for now.

I’m a Mercurial kinda guy (hereafter hg). Mercurial is the version control system (VCS) that Octave uses, so that’s mostly the reason why I started using it too. I started reading about it, and learning it, and liking it a lot. It makes a lot of sense to me. It’s simple when it needs to be simple and flexible when it needs to be complex.

The other big contender for a VCS is usually git. In fact, it’s quite a large contender. Just going by comparing github and bitbucket, the two large commercial hosts for git and hg respectively (don’t let that .org domain fool you, bitbucket is definitely a commercial venture), github is way larger. It is by far easier to find people praising git in the blogs, the discussion forums, and the mailing lists than it is to find people cheering for hg. I have dabbled with git in the past, and I always found it difficult to understand. I always chalked up this difficulty to just being more familiar with hg, and being nothing more than a personal preference. However, I have recently seen that I am not alone in thinking that git is complicated. Regardless, seeing how immensely popular, vastly more popular than hg it is, I decided to try git again today.

I decided to make a conscious effort again today to learn and use git. I had a practical reason too, to fix a Debian RC bug (perhaps a little late, I hope the release managers let the package back into testing after this). Also, I wanted to streamline the flow for hacking on Debian packages. The Debian packaging of Octave used to be under svn, and later turned into git packages. One thing that makes a lot of sense under svn, since it tracks individual directories, not whole projects, is to only put the debian/ directory under source control. This practice got inherited with the git repositories, and it’s really awkward. Setting yourself up to hack a Debian Octave package goes something like this:

apt-get source octave-foo
rm -r octave-foo-$version/debian
cd octave-foo
git clone git+ssh://git.debian.org/git/pkg-octave/octave-foo.git

    # ...
    # hack hack hack
    # ...

dpkg-buildpackage

which is really awkward. My goal was to get to this:

git clone git+ssh://git.debian.org/git/pkg-octave/octave-foo.git

    # ...
    # hack hack hack
    # ...

git-buildpackage

So I set out to do that. With somewhat unfortunate results.

Let me talk a little more about where I’m coming from: hg. In hg, there are some guiding philosophical principles that have become second nature to me when working with source. One of the core hg principles is that it’s really hard to destroy data, in particular history. There are certain destructive operations with hg, but they almost all create backups, and are disabled by default. The way to enable them is to turn on extensions. In particular, hg makes it virtually impossible to destroy any data remotely unless the person who controls that remote repository somehow enables it with hooks. That is, the person would have to write a script that when you manipulate their repository remotely (and the only commands to do this are pull and push), that script would delete some data.

This vibes really well with me. One of the things that git users praise the most is how easy it is to edit history, to undo mistakes, to rebase changes… hg doesn’t make these tasks impossible, merely difficult or disabled by default; and I tend to side with that point of view. It’s safer. Mercurial takes care of my data when I want it to, and when I need it to do dangerous things, I first have to remove its muzzle, and the muzzle snaps back into place when hg is done doing its dastardly deed. It’s a bit like using “sudo” to perform just one dangerous operation instead of “su”, and then staying in the root shell, while performing several operations, none of which really needs the extra permissions.

So on to what happened: during my work of trying to make it easier to work with Debian, I had created several git branches (which are nothing like hg branches, but whatever, that’s not a big deal). When it looked like my work was in good shape, I pushed it to the Debian git repo. Oh, oops, that only pushes one branch. That’s quite unlike hg which pushes all of the work here that doesn’t exist there. Well, not a big deal, that’s a bit like git’s staging area I thought. Just one more step to get what I want. But I had like three different branches here that weren’t there, so I figured there must be some command to get them all there at once. I asked around in IRC, and someone naïvely suggested using the --mirror command, and I naïvely trusted them without checking what that option would do. I thought it would just copy all of my branches from here to there, mirroring all of them.

And so it did. However, it also checked that there were some branches there that I didn’t have in my local clone, and it erased them. I blinked. Wait. Did git just remotely remove some branches? Oh, well, I’m sure it’s just some metadata that got moved around. Where’s the undo button? Rollback? Restore? I went to #git in IRC to ask.

“… you do have backups, don’t you?”

I blinked again.

Whiskey.

Tango.

Foxtrot.

You’re telling me… that a VCS… one of the most popular ones out there… allows me to delete data remotely? With a command that isn’t called even called “delete” or “wipe” or “force” but innocuously called “mirror”?

I was aghast.

My conversational partners in #git gave me the usual spiel about backups, about how it’s great to be able to shoot yourself in the foot, about how it was my fault, about how I should have read the manual… but I was unable to accept any of this. I just couldn’t conceive that a tool that is supposed to keep my history … to be a little “bit-hoarder” … to never lose data … not only lets me lose data locally, which is ok, but furthermore lets me delete data remotely.

Now, granted, this wasn’t terribly important data. Nothing of great value was lost. Since branches in git are more like tags (but not what git calls tags), it’s just metadata that was lost. The functional part is all there. At the same time, a user’s most valuable possession, data, was harmed by the very tool that’s supposed to protect it. I hate the idea of having to tiptoe around my VCS, which should be a tool that lets me experiment wildly with my source, to try out crazy ideas, and at the same time keep my source safe, multiply backed up, fully mirrored in every clone of the source whether local or remote. Mercurial, for example, doesn’t let me delete data remotely. The worst I can do is add a lot of useless data remotely, but that’s much better than being able to delete it.

Reeling, I did the only thing that could be done and emailed the Debian Octave Group mailing list, asking if someone had a clone of the repository with the missing branches. I hoped that I could recover the lost data by copying it from them. If not, it won’t be a great loss, just an awkward inconvenience. The whole experience, though, has given me a great distaste for git. I still find it much more complicated than hg, even despite my best attempts to understand it. And it’s shown me that I can’t treat it carelessly, that I have to read its gargantuan manpages and thoroughly understand each and every command and option before I use them, lest I provoke damage.

Next time, I’m using the hg-git extension, and I don’t think I’ll be touching git again for a while until I recover from this nasty experience.

Tags: , , ,

Comment Feed

11 Responses

  1. Git never deletes data, or, put differently, all your data is still there.

    Let’s start at the front. –mirror is a command to mirror a local repository, meaning that it will remove references remotely that have been removed locally. That is a feature.

    What you wanted is git push origin –all, or git push with an appropriate refspec, or the push.default setting (man git-config).

    As to getting your data back: you only deleted references, or pointers, to heads. The heads are still there. You just need to find them. The ways to find them are multifarious: if you find a SHA-1 in your scrollback, you can use that, e.g.

    git branch recovered

    Else you can make use of the reflog (man git-reflog), and if all else fails, use git-fsck to reconnect dangling heads.

    Git will not delete any data, unless you explicitly tell it to (git-gc). Git usually won’t even let you juggle refs, but you can do that. If you drop the ball, just pick it up again and keep going. Never panic.

    • git deletes metadata remotely, which is a kind of data.

      Yes, that metadata can be reconstructed from the reflog, with some work and some luck, as I later learned.

      But this is a huge surprise for me. I didn’t even *consider* that a VCS could remotely *delete anything*. By comparison, all hg remote operations are append-only. I believe the same holds for bzr and darcs, but correct me if I’m wrong. You can’t delete any data with hg, not even metadata, with push and pull operations (unless you explicitly install server hooks in the server to do so after a remote operation).

      This is a pretty dumb git design, I think. Allowing deletion of any data, including metadata, is like having a big red self-destruct button in your computer in case you ever want your computer to blow up. And don’t tell me it was my fault for not reading the label on the self-destruct button. Why would my laptop even have such a button?

  2. The reflog is an integral part of Git, and it is append-only. What you claim — that Git deletes data — is just plain wrong.

    You come at Git with a Mercurial perspective. That’s fine, but please refrain from bashing Git whenever it does not behave like Mercurial, because it is *not* Mercurial, nor is it trying to be. Git provides you with features that Mercurial does not have (and vice versa), and Git comes with shortcomings that mercurial does not have (and vice versa). That is how it goes, everywhere. Don’t waste your time ranting about it. If you want Mercurial, use Mercurial. If you want Git, use Git. It’s as simple as that.

  3. It is not plain wrong that metadata is a kind of data. Which git does delete. Remotely. The reflog stores a different form of that data, and it’s not immediately obvious how to extract the original metadata from the reflog. Correct me if I’m wrong, but I don’t believe the structure and nature of the reflog is documented in git’s documentation. At least at the time when I had to partially extract the lost data from the reflog, I had to work hard reading several external sources and rely on oral tradition (such as the present blog and discussion). Perhaps the situation has rapidly changed since I last experienced my unfortunate git accident.

    What is plainly false is that the reflog is append-only. There exists an explicit git reflog expire command that also deletes data from the reflog. Another git self-destruct button hidden in plain sight. Not to mention that since the reflog is simply metadata, the data it points to can vanish with time (can be garbage collected), making the metadata pretty much useless.

    Your concerns that I am wasting my time, a simple hg fanboy thinking in hg terms (I think I have gotten quite good at understand what git does and why… and I still think it’s a horrible design… plus I have some working knowledge of other DVCSes like the aforementioned bzr and darcs), and that I can avoid git… I have already addressed in another blog post. I have more blog posts on this topic forthcoming.

    It is unfortunate that we must appear as adversaries to each other. This is precisely the opposite of what git is supposed to foster: collaboration. Instead we are arguing over its (de)merits when we could be working together.

  4. I use git, and find it extremely useful. I have not used hg very much.

    Git is quirky, inconsistent, and It can be very complex to do things that aught to be simple. While the foundation of git is very good, there is a lot of poorly thought-out and badly documented cruft on top of that.

    Given that hg provides most of the same feature set, is extensible, safer, better documented, almost as fast (in spite of being written in python), and orders of magnitude simpler and easier to use, I think it’s clear that hg is for now the superior product. I will start using it!

    If the git developers wish to create a really great quality product, they need to take a step back, wean themselves off the coffee and crack, take a deep breath, and create a totally new front-end which is simple and sensible. They should consider the feedback, and learn from their experience with git version 1 “chaos”. It would not hurt to cooperate with some people who are expressive and literate in the English language!

    • If you’re already a seasoned git user, your first impression with hg is that it’s “limited” and can’t do the same things as git. This is because hg tries to be conservative about the UI it presents to users, and advanced or dangerous features are disabled by default. Hg can do everything that git can do, but you have to know how to enable that feature, and you have to realise that git uses a completely different set of terminology than any other VCS. By contrast, works really hard to use terminology that is evocative and consistent with other VCSes.

  5. Greg AMarch 29, 2013 @ 01:43Reply

    You’re an idiot. That’s alright, we all start somewhere. Git *does* allow you to shoot yourself in the foot. This is something true of git. Maybe it would be even better than it is if it did not, but I somehow doubt it. As a rule, my first time trying something radical with git, I do it on a scratch repository. –mirror sure sounds radical to me, I would definitely not do that to a live useful upstream for the first time ever.

    I’ve been firing rounds pretty madly with git for 2 years now and I have yet to hit my feet (or lose any data). I did manage to lose a reference to a commit a few times. I wanted to use git reset –hard and –soft as a hamfisted technique for editing history. So I dug myself a hole in my scratch repository. At the bottom of the hole, I found git-reflog, and brought back my test commits easy pie. I made sure I knew how to climb out of the hole before I started using git-reset on important repositories.

    I once walked into an irc channel and they told me to run “cat /dev/zero > /dev/sda” and I did it. I mean, cats are cute and don’t sound like delete at all. I didn’t want to admit I was an idiot, instead I said that cats are stupid.

    Actually, that never happened, I just didn’t want you to feel like you’re alone.

    I don’t know anything about hg. I have the feeling it is feature-wise comparable to git. I am, however, absolutely irretrievably in love with git because it is *SO*MUCH* better than rcs, cvs, or svn (and don’t trashtalk them — rcs is a million times better than “cp foo.c foo.c.bak”). I just got exposed to bzr for the first time, which I also thought was feature-comparable to git. Oh god, my hatred for bzr overflows. You know the tutorial-approved technique for branching a launchpad bzr repository winds up making a completely new copy of the history on your local computer. So then you have two complete local copies of the history, one that you can sync with the upstream trunk, and the other which is only good for your branch. There is a way around it, but it is an advanced topic instead of the regular advertised way of interacting with the thing. Git fucking rocks.

  6. i found this page while looking for co-commiserators after a data loss scenario of my own today. Few of my colleagues believe data loss is possible in git. i’ve lost more data in git than any other single piece of software (in over 30 years of using computers).

    You might find “Fossil” to your liking:

    http://fossil-scm.org

    (Disclosure: i’m one of the project’s code monkeys)

    Fossil makes it _impossible_ to rewrite history (just to amend it with the equivalent of a sticky note), and in its 7 years of operation we’ve _never_ had a report of data loss.

    Not incidentally, Fossil is the SCM used by sqlite, and was in fact started (by Richard Hipp, sqlite’s father) for that very purpose.

    • Mercurial is working towards this “impossibility” of rewriting history. This feature is called Mercurial Evolve. It’s very clever, changesets get rewritten and you get a meta-history of which cset replaces which one.

      I applaud Fossil for its efforts, and I would have liked it to get more popular, but at the moment, the most popular alternative to git is hg, so that’s where I focus my efforts.



Some HTML is OK

or, reply to this post via trackback.

Continuing the Discussion

  1. [...] committing when they merge, or by automatically merging and committing when they pull. They will not take kindly to Git deleting data remotely — or even merely appearing to delete data remotely — when you supply apparently innocuous [...]









Читай самый обычный блог про жизнь npoctoblog.ru - честно и откровенно | Увлекательные игры на компьютер нравятся всем