… or at least I hate it for now.
I’m a Mercurial kinda guy (hereafter hg). Mercurial is the version control system (VCS) that Octave uses, so that’s mostly the reason why I started using it too. I started reading about it, and learning it, and liking it a lot. It makes a lot of sense to me. It’s simple when it needs to be simple and flexible when it needs to be complex.
The other big contender for a VCS is usually git. In fact, it’s quite a large contender. Just going by comparing github and bitbucket, the two large commercial hosts for git and hg respectively (don’t let that .org domain fool you, bitbucket is definitely a commercial venture), github is way larger. It is by far easier to find people praising git in the blogs, the discussion forums, and the mailing lists than it is to find people cheering for hg. I have dabbled with git in the past, and I always found it difficult to understand. I always chalked up this difficulty to just being more familiar with hg, and being nothing more than a personal preference. However, I have recently seen that I am not alone in thinking that git is complicated. Regardless, seeing how immensely popular, vastly more popular than hg it is, I decided to try git again today.
I decided to make a conscious effort again today to learn and use git. I had a practical reason too, to fix a Debian RC bug (perhaps a little late, I hope the release managers let the package back into testing after this). Also, I wanted to streamline the flow for hacking on Debian packages. The Debian packaging of Octave used to be under svn, and later turned into git packages. One thing that makes a lot of sense under
rm -r octave-foo-$version/debian
git clone git+ssh://git.debian.org/git/pkg-octave/octave-foo.git
# hack hack hack
which is really awkward. My goal was to get to this:
# hack hack hack
So I set out to do that. With somewhat unfortunate results.
Let me talk a little more about where I’m coming from: hg. In hg, there are some guiding philosophical principles that have become second nature to me when working with source. One of the core hg principles is that it’s really hard to destroy data, in particular history. There are certain destructive operations with hg, but they almost all create backups, and are disabled by default. The way to enable them is to turn on extensions. In particular, hg makes it virtually impossible to destroy any data remotely unless the person who controls that remote repository somehow enables it with hooks. That is, the person would have to write a script that when you manipulate their repository remotely (and the only commands to do this are pull and push), that script would delete some data.
This vibes really well with me. One of the things that git users praise the most is how easy it is to edit history, to undo mistakes, to rebase changes… hg doesn’t make these tasks impossible, merely difficult or disabled by default; and I tend to side with that point of view. It’s safer. Mercurial takes care of my data when I want it to, and when I need it to do dangerous things, I first have to remove its muzzle, and the muzzle snaps back into place when hg is done doing its dastardly deed. It’s a bit like using “sudo” to perform just one dangerous operation instead of “su”, and then staying in the root shell, while performing several operations, none of which really needs the extra permissions.
So on to what happened: during my work of trying to make it easier to work with Debian, I had created several git branches (which are nothing like hg branches, but whatever, that’s not a big deal). When it looked like my work was in good shape, I pushed it to the Debian git repo. Oh, oops, that only pushes one branch. That’s quite unlike hg which pushes all of the work here that doesn’t exist there. Well, not a big deal, that’s a bit like git’s staging area I thought. Just one more step to get what I want. But I had like three different branches here that weren’t there, so I figured there must be some command to get them all there at once. I asked around in IRC, and someone naïvely suggested using the --mirror command, and I naïvely trusted them without checking what that option would do. I thought it would just copy all of my branches from here to there, mirroring all of them.
And so it did. However, it also checked that there were some branches there that I didn’t have in my local clone, and it erased them. I blinked. Wait. Did git just remotely remove some branches? Oh, well, I’m sure it’s just some metadata that got moved around. Where’s the undo button? Rollback? Restore? I went to #git in IRC to ask.
“… you do have backups, don’t you?”
I blinked again.
You’re telling me… that a VCS… one of the most popular ones out there… allows me to delete data remotely? With a command that isn’t called even called “delete” or “wipe” or “force” but innocuously called “mirror”?
I was aghast.
My conversational partners in #git gave me the usual spiel about backups, about how it’s great to be able to shoot yourself in the foot, about how it was my fault, about how I should have read the manual… but I was unable to accept any of this. I just couldn’t conceive that a tool that is supposed to keep my history … to be a little “bit-hoarder” … to never lose data … not only lets me lose data locally, which is ok, but furthermore lets me delete data remotely.
Now, granted, this wasn’t terribly important data. Nothing of great value was lost. Since branches in git are more like tags (but not what git calls tags), it’s just metadata that was lost. The functional part is all there. At the same time, a user’s most valuable possession, data, was harmed by the very tool that’s supposed to protect it. I hate the idea of having to tiptoe around my VCS, which should be a tool that lets me experiment wildly with my source, to try out crazy ideas, and at the same time keep my source safe, multiply backed up, fully mirrored in every clone of the source whether local or remote. Mercurial, for example, doesn’t let me delete data remotely. The worst I can do is add a lot of useless data remotely, but that’s much better than being able to delete it.
Reeling, I did the only thing that could be done and emailed the Debian Octave Group mailing list, asking if someone had a clone of the repository with the missing branches. I hoped that I could recover the lost data by copying it from them. If not, it won’t be a great loss, just an awkward inconvenience. The whole experience, though, has given me a great distaste for git. I still find it much more complicated than hg, even despite my best attempts to understand it. And it’s shown me that I can’t treat it carelessly, that I have to read its gargantuan manpages and thoroughly understand each and every command and option before I use them, lest I provoke damage.
Next time, I’m using the hg-git extension, and I don’t think I’ll be touching git again for a while until I recover from this nasty experience.
19 thoughts on “I hate git”
Git never deletes data, or, put differently, all your data is still there.
Let’s start at the front. –mirror is a command to mirror a local repository, meaning that it will remove references remotely that have been removed locally. That is a feature.
What you wanted is git push origin –all, or git push with an appropriate refspec, or the push.default setting (man git-config).
As to getting your data back: you only deleted references, or pointers, to heads. The heads are still there. You just need to find them. The ways to find them are multifarious: if you find a SHA-1 in your scrollback, you can use that, e.g.
git branch recovered
Else you can make use of the reflog (man git-reflog), and if all else fails, use git-fsck to reconnect dangling heads.
Git will not delete any data, unless you explicitly tell it to (git-gc). Git usually won’t even let you juggle refs, but you can do that. If you drop the ball, just pick it up again and keep going. Never panic.
git deletes metadata remotely, which is a kind of data.
Yes, that metadata can be reconstructed from the reflog, with some work and some luck, as I later learned.
But this is a huge surprise for me. I didn’t even *consider* that a VCS could remotely *delete anything*. By comparison, all hg remote operations are append-only. I believe the same holds for bzr and darcs, but correct me if I’m wrong. You can’t delete any data with hg, not even metadata, with push and pull operations (unless you explicitly install server hooks in the server to do so after a remote operation).
This is a pretty dumb git design, I think. Allowing deletion of any data, including metadata, is like having a big red self-destruct button in your computer in case you ever want your computer to blow up. And don’t tell me it was my fault for not reading the label on the self-destruct button. Why would my laptop even have such a button?
You’d be surprised at the kinds of things people want. Many people concerned about privacy have talked about having self-destruct mechanisms wired to their machines. (Naturally, I have no way of verifying whether or not they have done this. They didn’t exactly give their names or anything.)
I don’t remember anyone saying anything about an actual big red button, though.
The reflog is an integral part of Git, and it is append-only. What you claim — that Git deletes data — is just plain wrong.
You come at Git with a Mercurial perspective. That’s fine, but please refrain from bashing Git whenever it does not behave like Mercurial, because it is *not* Mercurial, nor is it trying to be. Git provides you with features that Mercurial does not have (and vice versa), and Git comes with shortcomings that mercurial does not have (and vice versa). That is how it goes, everywhere. Don’t waste your time ranting about it. If you want Mercurial, use Mercurial. If you want Git, use Git. It’s as simple as that.
It is not plain wrong that metadata is a kind of data. Which git does delete. Remotely. The reflog stores a different form of that data, and it’s not immediately obvious how to extract the original metadata from the reflog. Correct me if I’m wrong, but I don’t believe the structure and nature of the reflog is documented in git’s documentation. At least at the time when I had to partially extract the lost data from the reflog, I had to work hard reading several external sources and rely on oral tradition (such as the present blog and discussion). Perhaps the situation has rapidly changed since I last experienced my unfortunate git accident.
What is plainly false is that the reflog is append-only. There exists an explicit git reflog expire command that also deletes data from the reflog. Another git self-destruct button hidden in plain sight. Not to mention that since the reflog is simply metadata, the data it points to can vanish with time (can be garbage collected), making the metadata pretty much useless.
Your concerns that I am wasting my time, a simple hg fanboy thinking in hg terms (I think I have gotten quite good at understand what git does and why… and I still think it’s a horrible design… plus I have some working knowledge of other DVCSes like the aforementioned bzr and darcs), and that I can avoid git… I have already addressed in another blog post. I have more blog posts on this topic forthcoming.
It is unfortunate that we must appear as adversaries to each other. This is precisely the opposite of what git is supposed to foster: collaboration. Instead we are arguing over its (de)merits when we could be working together.
I use git, and find it extremely useful. I have not used hg very much.
Git is quirky, inconsistent, and It can be very complex to do things that aught to be simple. While the foundation of git is very good, there is a lot of poorly thought-out and badly documented cruft on top of that.
Given that hg provides most of the same feature set, is extensible, safer, better documented, almost as fast (in spite of being written in python), and orders of magnitude simpler and easier to use, I think it’s clear that hg is for now the superior product. I will start using it!
If the git developers wish to create a really great quality product, they need to take a step back, wean themselves off the coffee and crack, take a deep breath, and create a totally new front-end which is simple and sensible. They should consider the feedback, and learn from their experience with git version 1 “chaos”. It would not hurt to cooperate with some people who are expressive and literate in the English language!
If you’re already a seasoned git user, your first impression with hg is that it’s “limited” and can’t do the same things as git. This is because hg tries to be conservative about the UI it presents to users, and advanced or dangerous features are disabled by default. Hg can do everything that git can do, but you have to know how to enable that feature, and you have to realise that git uses a completely different set of terminology than any other VCS. By contrast, works really hard to use terminology that is evocative and consistent with other VCSes.
You’re an idiot. That’s alright, we all start somewhere. Git *does* allow you to shoot yourself in the foot. This is something true of git. Maybe it would be even better than it is if it did not, but I somehow doubt it. As a rule, my first time trying something radical with git, I do it on a scratch repository. –mirror sure sounds radical to me, I would definitely not do that to a live useful upstream for the first time ever.
I’ve been firing rounds pretty madly with git for 2 years now and I have yet to hit my feet (or lose any data). I did manage to lose a reference to a commit a few times. I wanted to use git reset –hard and –soft as a hamfisted technique for editing history. So I dug myself a hole in my scratch repository. At the bottom of the hole, I found git-reflog, and brought back my test commits easy pie. I made sure I knew how to climb out of the hole before I started using git-reset on important repositories.
I once walked into an irc channel and they told me to run “cat /dev/zero > /dev/sda” and I did it. I mean, cats are cute and don’t sound like delete at all. I didn’t want to admit I was an idiot, instead I said that cats are stupid.
Actually, that never happened, I just didn’t want you to feel like you’re alone.
I don’t know anything about hg. I have the feeling it is feature-wise comparable to git. I am, however, absolutely irretrievably in love with git because it is *SO*MUCH* better than rcs, cvs, or svn (and don’t trashtalk them — rcs is a million times better than “cp foo.c foo.c.bak”). I just got exposed to bzr for the first time, which I also thought was feature-comparable to git. Oh god, my hatred for bzr overflows. You know the tutorial-approved technique for branching a launchpad bzr repository winds up making a completely new copy of the history on your local computer. So then you have two complete local copies of the history, one that you can sync with the upstream trunk, and the other which is only good for your branch. There is a way around it, but it is an advanced topic instead of the regular advertised way of interacting with the thing. Git fucking rocks.
Oh, hey. I was just going through my moderation queue and found this comment got lost in there.
I am not an idiot for disliking git. Neither are these people:
That git has an absolutely horrid CLI is a well-recognised fact. I was just one of the first ones to blog about it. :-)
i found this page while looking for co-commiserators after a data loss scenario of my own today. Few of my colleagues believe data loss is possible in git. i’ve lost more data in git than any other single piece of software (in over 30 years of using computers).
You might find “Fossil” to your liking:
(Disclosure: i’m one of the project’s code monkeys)
Fossil makes it _impossible_ to rewrite history (just to amend it with the equivalent of a sticky note), and in its 7 years of operation we’ve _never_ had a report of data loss.
Not incidentally, Fossil is the SCM used by sqlite, and was in fact started (by Richard Hipp, sqlite’s father) for that very purpose.
Mercurial is working towards this “impossibility” of rewriting history. This feature is called Mercurial Evolve. It’s very clever, changesets get rewritten and you get a meta-history of which cset replaces which one.
I applaud Fossil for its efforts, and I would have liked it to get more popular, but at the moment, the most popular alternative to git is hg, so that’s where I focus my efforts.
Git is really hard to use and mercurial is not only easier to use but it’s much better at tracking changes. By default, Git will forget branch histories leading to all sorts of weird graphs and logs. This is why Git people “clean” their history. You never have to worry about this using named branches in mercurial. Same with branches in bazaar. Both are better than Git and much easier to use.
See http://jhw.dreamwidth.org/2049.html and http://duckrowing.com/2013/12/26/bzr-init-a-bazaar-tutorial/
I find it absolutely fascinating the line drawn between version control people. The Mercurial people insist that Git is dangerous and difficult and that Git people are rude and unhelpful. Meanwhile, I have vast experience using both, and have spent countless hours in both #git and #mercurial. Git is actually much simpler than Mercurial. What’s difficult is the workflows that Git people often choose, and they choose those workflows because they’re superior. To accomplish those same workflows in Mercurial is much more difficult and much more dangerous than Git. It just so happens that Mercurial people tend to choose a more simple, sloppy, careless workflow. That same workflow would work flawlessly in Git without any difficult or dangerous commands.
In #git, I am welcomed with countless people trying to figure out how to solve a problem to my satisfaction (I am one of them when I have the chance), including all the necessary warnings and disclaimers. In #mercurial, I am chased away by loathing developers preaching about how what I’m trying to do is *wrong* and they don’t want to help me with it. That or I’m encouraged to use something that is dangerous and when it completely (not virtually) destroys my data either per design or bug the developers blame *me* for using that workflow. Git and Mercurial are nearly as capable as one another, but Git can do a few things more than Mercurial can, and it does everything much easier and more reliably.
Those “extensions” that Mercurial people love to brag about have corrupted my repository on numerous occasions. It’s not nearly correct to say that Mercurial is par for features with Git. My most recent experience was with core Mercurial developers in #mercurial instructing me to use the new Evolve extension, reassuring me that they had used it for months without any problems and it was basically safe and stable. Cue my angry tears a day later when it irrecoverably lost my history and corrupted my repository. All the devs could do is shrug. I doubt the Mercurial devs are even fluent in a workflow that would actually benefit (or disadvantage, as the case may be) from Evolve.
I am stuck using MQ instead to manage my history with Mercurial, which is like a knife with a blade at both ends. It takes a lot of practice to wield it without cutting yourself. Even with all of my years of experience with Mercurial I cannot match my productivity with Git.
Just this last week I recently got myself versed with Bazaar. I was outraged that a proponent had the nerve to wrote a blog post about how Git sucked, Mercurial was good, and Bazaar was best. What it tells me is that the people that prefer Mercurial cannot be trusted to hold a knife, and the people that prefer Bazaar should probably be using safety-scissors.
Or maybe you just need to spend a bit more time with Git, stop drinking the Mercurial kool-aid, and RTFM. If somebody on the Internet tells you to “rm -fR /” as root and you do without researching what the options mean then you have nobody to blame. Not everybody in #git is an expert. You are most likely not interacting with the Git developers themselves (unlike #mercurial, for what good that does you…). Take advice with a grain of sand, read the very good Git documentation before issuing a command that you aren’t familiar with (one of my biggest gripes with Mercurial is that the documentation is very poor), and make backups of your repositories before trying commands that you aren’t familiar with.
There is a learning curve with Git, but if you’re going to be using it for years or decades then the little bit of time invested in learning to wield it are well worth it. Or just limit yourself to the preferred Mercurial workflow and you can’t really hurt yourself.
The fact that Mercurial wants to push all branches by default is a major annoyance for me. Not all history is meant to be shared with every remote repository all of the time. That’s not a proper distributed workflow. You might as well go back to Subversion with that kind of workflow. Or Bazaar, which is basically as close to Subversion as you can get with a DVCS.
Wow, this is an angry response against Mercurial.
I’d be interested in hearing what happened with Evolve, though. Who told you it was stable? It is very definitely not, and whoever suggested it was made a grave mistake.
That workflows become more complicated when transforming them from Git to Mercurial is counter to my experience. When I adapted the “successful Git branching model”¹ to Mercurial, the workflow became much simpler² – simple enough that it boils down to just three basic rules:
(1) you do all the work on default – except for hotfixes.
(2) on stable you only do hotfixes, merges for release and tagging for release. Only maintainers touch stable.
(3) you can use arbitrary feature-branches, as long as you don’t call them default or stable. They always start at default (since you do all the work on default).
The first time I used Mercurial unshelve, I had an conflict. Mercurial open vimdiff (I hate vimdiff) and I close it with :cq command.
I get the files list in conflict with the hg resolve -l command.
But I haven’t conflicts markers in the file in conflict, and the file.orig has the same content of the file.
I didn’t know the hg unshelve –abort command, So I lost my modifications.
When we don’t know our tool we do mistake.
You don’t know Git, so it’s natural you do mistakes.
Yes, this was a bug in shelve. It has been fixed.
No amount of unfamiliarity with your VCS should result in lost data. Ever. If the user ever unintentionally loses data, this is a bug.
I would argue that sufficient unfamiliarity causing lost data is on the user. If you don’t have any idea what the commands you’re running do, any consequences are on you for not finding out first. If I don’t understand what I’m doing with my repository and I scribble over some important data, that’s my fault because I should have learned what I was doing first.
But also, at the same time, I agree that part of this isn’t your fault. Do I think you should have checked what mirroring did first? Yes. Never run a command until you know what it does. Do I think that mirroring should maybe have a warning up? Yeah, that’d probably be a good idea. It should always be possible to do dangerous things, because sometimes you need to. But it should also be made clear that what you’re trying to do IS dangerous. There’s a reason that pushing to a remote that’s ahead of your local requires you to tell it to force the push, and warns you about not doing that lightly.
In the end, I think that git shares a little of the blame – there should be a warning, and maybe some easier/simpler/better documentation – but most of it rests on you for running an unknown command given to you by a random user on the internet.
You should NEVER run unknown commands, ESPECIALLY ones given to you by people you don’t even known.