Articles tagged git at null program

Moving to Openbox

2012-06-25T00:00:00Z

With my dotfiles repository established I now have a common configuration and environment for Bash, Git, Emacs (separate repository), and even Firefox! This wouldn’t normally be possible because Firefox doesn’t have tidy dotfiles by default, but the wonderful Pentadactyl made it possible. My script sets up keybindings, bookmark keywords, and quickmarks so that my browser feels identical across all my computers. Now that it’s easy to add tweaks, I’m sure I’ll be putting more in there in the future.

However, one major application remained and I was really itching to capture its configuration too, since even my web browser is part of the experience. I could drop my dotfiles into a new computer within minutes and be ready to start hacking, except for my desktop environment. This was still a tedious, manual step, plagued by the configuration propagation issue. I wouldn’t to get too fancy with keybindings since I couldn’t rely on them being everywhere.

The problem was I was using KDE at the time and KDE’s configuration isn’t really version-friendly. Some of it is binary, making it unmergable, it doesn’t play well between different versions, and it’s unclear what needs to be captured and what can be ignored.

I wasn’t exactly a happy KDE user and really felt no attachment to it. I had only been using it a few months. I’ve used a number of desktops since 2004, the main ones being Xfce (couple years), IceWM (couple years), xmonad (8 months), and Gnome 2 (the rest of the time). Gnome 2 was my fallback, the familiar environment where I could feel at home and secure — that is, until Gnome 3 / Unity. The coming of Gnome 3 marked the death of Gnome 2. It became harder and harder to obtain version 2 and I lost my fallback.

I gave Gnome 3 and Unity each a couple of weeks but I just couldn’t stand them. Unremovable mouse hotspots, all new alt-tab behavior, regular crashing (after restoring old alt-tab behavior), and extreme unconfigurability even with a third-party tweak tool. I jumped for KDE 4, hoping to establish a comfortable fallback for myself.

KDE is pretty and configurable enough for me to get work done. There’s a lot of bloat (“activities” and widgets), but I can safely ignore it. The areas where it’s lacking didn’t bother me much, like the inability/non-triviality of custom application launchers.

My short time with Gnome 3 and now with KDE 4 did herald a new, good change to my habits: keyboard application launching. I got used to using the application menu to type my application name and launch it. I did use dmenu during my xmonad trial, but I didn’t quite make a habit out of it. It was also on a slower computer, slow enough for dmenu to be a problem. For years I was just launching things from a terminal. However, the Gnome and KDE menus both have a big common annoyance. If you want to add a custom item, you need to write a special desktop file and save it to the right location. Bleh! dmenu works right off your PATH — the way it should work — so no special work needed.

Gnome 2 has been revived with a fork called MATE, but with the lack of a modern application launcher, I’m now too spoiled to be interested. Plus I wanted to find a suitable environment that I could integrate with my dotfiles repository.

After being a little embarrassed at Luke’s latest Show Me Your Desktop (what kind of self-respecting Linux geek uses a heavyweight desktop?!) I shopped around for a clean desktop environment with a configuration that would version properly. Perhaps I might find that perfect desktop environment I’ve been looking for all these years, if it even exists. It wasn’t too long before I ended up in Openbox. I’m pleased to report that I’m exceptionally happy with it.

Its configuration is two XML files and a shell script. The XML can be generated by a GUI configuration editor and/or edited by hand. The GUI was nice for quickly seeing what Openbox could do when I first logged into it, so I did use it once and find it useful. The configuration is very flexible too! I created keyboard bindings to slosh windows around the screen, resize them, move them across desktops, maximize in only one direction, change focus in a direction, and launch specific applications (for example super-n launches a new terminal window). It’s like the perfect combination of tiling and stacking window managers. Not only is it more configurable than KDE, but it’s done cleanly.

Openbox is pretty close to the perfect environment I want. There are still some annoying little bugs, mostly related to window positioning, but they’ve mostly been fixed. The problem is that they haven’t made an official release for a year and a half, so these fixes aren’t yet available. I might normally think to myself, “Why haven’t I been using Openbox for years?” but I know better than that. Versions of Openbox from just two years ago, like the one in Debian Squeeze (the current stable), aren’t very good. So I haven’t actually been missing out on anything. This is something really new.

I’m not using a desktop environment on top of Openbox, so there are no panels or any of the normal stuff. This is perfectly fine for me; I have better things to spend that real estate on. I am using a window composite manager called xcompmgr to make things pretty through proper transparency and subtle drop shadows. Without panels, there were a couple problems to deal with. I was used to my desktop environment performing removable drive mounting and wireless network management for me, so I needed to find standalone applications to do the job.

Removable filesystems can be mounted the old fashioned way, where I create a mount point, find the device name, then mount the device on the mount point as root. This is annoying and unacceptable after experiencing automounting for years. I found two applications to do this: Thunar, Xfce’s file manager; and pmount, a somewhat-buggy command-line tool.

I chose Wicd to do network management. It has both a GTK client and an ncurses client, so I can easily manage my wireless network connectivity with and without a graphical environment — something I could have used for years now (goodbye iwconfig)! Unfortunately Wicd is rigidly inflexible, allowing only one network interface to be up at a time. This is a problem when I want to be on both a wired and wireless network at the same time. For example, sometimes I use my laptop as a gateway between a wired and wireless network. In these cases I need to shut down Wicd and go back to manual networking for awhile.

The next issue was wallpapers. I’ve always liked having natural landscape wallpapers. So far, I could move onto a new computer and have everything functionally working, but I’d have a blank gray background. KDE 4 got me used to slideshow wallpaper, changing the landscape image to a new one every 10-ish minutes. For a few years now, I’ve made a habit of creating a .wallpapers directory in my home directory and dumping interesting wallpapers in there as I come across them. When picking a new wallpaper, or telling KDE where to look for random wallpapers, I’d grab one from there. I’ve decided to continue this with my dotfiles repository.

I wrote a shell script that uses feh to randomly set the root (wallpaper) image every 10 minutes. It gets installed in .wallpapers from the dotfiles repository. Openbox runs this script in the background when it starts. I don’t actually store the hundreds of images in my repository. There’s a fetch.sh that grabs them all from Amazon S3 automatically. This is just another small step I take after running the dotfiles install script. Any new images I throw in .wallpaper get put int the rotation, but only for that computer.

I’ve now got all this encoded into my configuration files and checked into my dotfiles repository. It’s incredibly satisfying to have this in common across each of my computers and to have it instantly available on any new installs. I’m that much closer to having the ideal (and ultimately unattainable) computing experience!

Versioning Personal Configuration Dotfiles

2012-06-23T00:00:00Z

For almost two months now I’ve been versioning all my personal dotfiles in Git. Just as when I did the same with Emacs, it’s been extremely liberating and I wish I had been doing this for years. Currently it covers 11 different applications including my web browser, shell, window manager, and cryptographic keys, giving me a unified experience across all of my machines — which, between home, work, and virtual computers is about half a dozen.

Like anything, the biggest problem with not versioning these files is introducing changes. If I add an interesting tweak to a dotfile, I won’t see that change on my other machines until I either copy it over or I enter it manually again. Because I’d worry about clobbering other unpropagated changes, it was usually the latter. Only changes I could commit to memory would propagate. Any tweak that wasn’t easy to duplicate manually I couldn’t rely on, so I was discouraged from customizing too much and relied mostly on defaults. This is bad!

Source control solves almost all of this trivially. If I notice a pattern in my habits or devise an interesting configuration, I can immediately make the change, commit it, and push it. Later, when I’m on another computer and I notice it missing, I just do a pull without needing to worry about clobbering any local changes. When moving onto a new computer/install, all I need to do is clone the repository and I’ve got every configuration I have without having the snoop around the last computer I used figuring out what to copy over.

Most of the applications I prefer have tidy, manually-editable dotfiles that version well, so I would be able to capture almost my entire environment. One near-exception was Firefox. By itself, it doesn’t play well, but since I use Pentadactyl I’m able to configure it cleanly like a proper application.

The last straw that triggered my dotfiles repository was managing my Bash aliases. It had gotten just long enough that I was tired of manually synchronizing them. It was finally time to invest some time into nipping this in the bud once and for all. Unsure what approach to take, I looked around to see what other people were doing. There are two basic approaches: version your entire home directory or symbolically link your dotfiles into place from a stand-alone repository.

The first approach is straightforward but has a number of issues that make it a poor choice. You don’t need an install script or anything special, you just use your home directory.

cd
git init
git add .bashrc .gitconfig ...

The first problem is that most the files Git sees you do not want to version. These are all going to show up in the status listing and, because there’s no pattern to them, there’s really no way to filter them out with a .gitconfig. Any other clones you have in your home directory may also confuse Git, looking like submodules. You’ll have to dodge this extra stuff all the time when working in the repository.

The second problem is that Git has only only one .git directory, in the repository root. If there’s no .git in the current directory, it will keep searching upwards until it finds one … which will inevitably be your dotfiles repository. This will eventually lead to annoying mistakes where you accidentally commit work to your dotfiles repository for awhile until you notice you forgot a git init. A possible workaround is to keep the .git directory out of your home directory and use the environment variable GIT_DIR to tell Git where it is when you’re working on it. That sounds like a pain to me.

The other approach is to have your dotfiles repository cloned on its own, then use symlinks to put the configuration files into place. You need to write an install script to do this. However, not all configuration files are sitting directly in your home directory. Some have their own directory. Modern applications have moved into a directory under ~/.config/. Your script needs to handle these.

Why symlinks rather than just copying the file into place? Well, if you make any changes to the installed files, Git won’t see them and you risk losing those changes.

Why symlinks rather than hard links? Symlinks deal with the atomic replacement issue better. Conscientious applications are very careful about how they write your data to disk. Unless it’s some kind of database, files are never edited in-place. The application rewrites the entire file at once. If the application is stupid and overwrites the file directly, there’s a brief instant where you data is not on disk at all! First, it truncates the original file, deleting your data, then it rewrites the data, and, if it’s not too stupid, calls fsync() to force the write to the hardware. It’s stupid, but it will work with symlinks.

The conscientious application will write the data to a temporary file, call fsync(), then atomically rename() the new file over top the original file. If there’s any failure along the way, some intact version of the data will be on the disk. The problem is that this will replace your symlink and changes won’t be captured by the repository. Such an incident will be obvious with symlinks, since the file will no longer be a symlink. Hard links are much less obvious.

Smart applications, like Emacs, also know not to clobber your symlinks and will handle these writes properly, leaving the symlink intact. With hard links, there is no way for the application to know it needs to treat a file specially.

I figured that I could use someone else’s install script, so I wouldn’t have to worry about getting this right. Since Ruby is so popular with Git, many people are using Rake for this task. However, I want to be able to maintain the install script myself and I don’t know Rake. I also don’t want to depend on anything unusual to install my dotfiles. So that was out.

Second, I don’t want to have to specifically list the files to install, or not install, in the script. Don’t put the same information in two places when one will do. This script should be able to tell on its own what files to install.

Third, I didn’t want my dotfiles to actually be dotfiles in my repository. It makes them hard to see and manage, since they’re hidden. They’re much easier to handle when the dot is replaced with an underscore.

So I wrote my own install script which installs any file beginning with an underscore. I’ve since added support for “private” dotfiles along the way. These are dotfiles that contain sensitive information and are encrypted in the repository, allowing me to continue publishing it safely.

If you’d like to create your own dotfiles repository, my dotfile repository may not be useful beyond standing as an example but my install script may be directly reusable for you.

There’s a lot to talk about, so I’ll be making a few more posts about this.

Presentations with Jekyll and deck.js

2012-04-30T00:00:00Z

At work, this has been The Year of Presentations for me so far. I’ve prepared and performed three hour-long presentations so far this year, and I will continue to do more. The presentations I’ve done before haven’t been too serious; I’d just slap a few slides together in whatever was handy and talk in front of them. However, with these more serious presentations, I was making much more use of the associated software. I haven’t been happy with any of them. They violate my preference for precision, after all.

The first one I went with KPresenter, part of KOffice. It had been years since I last used KOffice, so I thought I’d give it a shot. One the good side, I liked the templates. However, it crashed on me a lot, which was very frustrating. The GUI is lacking in a lot of places. For example, I wanted to re-arrange my slides, and dragging and dropping them feels like the natural choice. The mouse cursor even suggests it by switching to a hand icon. Nope, dragging and dropping does nothing. Overall, it felt like using a crummy version of Inkscape. The presentation was a mess when viewed by other presentation software, so I had to export it to a PDF to use it.

For the second one, I used LibreOffice’s Impress. It’s better than KPresenter, but it still feels clunky. It took some wrestling to get it to do what I wanted. As to be expected, I still had the same feeling of uneasiness I have about any WYSIWYG tool.

For the third one I used PowerPoint, as provided by my employer. The main reason for this was that I was ~~stealing~~ borrowing some important slides from a couple of other people’s presentations, so I had little choice. It was also an opportunity to compare it to the others. Overall I’d say it’s on the same level as Impress, with some slightly nicer GUI behavior.

Fortunately, I recently discovered what may become my preferred presentation tool! It’s deck.js.

With deck.js, I’ll be writing my presentations in HTML 5, something with which I’m already comfortable and experienced. Most importantly, I’ll be able to create my presentations with Emacs and version them with Git. That allows for easy collaboration on presentations without all the stupid e-mailing documents back and forth — though the other person would need to be comfortable with using deck.js, too. That leaves … well, just Brian I guess. So, in theory, this could make collaboration easier.

The downside to deck.js is that it requires a lot of boilerplate, especially if you want to use the extensions, a couple of which are absolutely essential in my opinion. Creating a new presentation requires going through this setup phase, and then working around all the boilerplate the rest of the time. I’ve successfully used Git to work around this problem with Java, so I’ve done the same here, with a little bit of help from Jekyll.

What I’ve done is used Jekyll as a default layout for deck.js. It hides away all of the deck.js boilerplate so that I can focus on my presentation. It also makes it trivial to start a new presentation. All I have to do is clone this repository and I’m ready to go.

git clone --recursive https://github.com/skeeto/jekyll-deck.git my-pres

The result looks like this: A Jekyll / deck.js Presentation.

Jekyll almost opens up the opportunity to really take deck.js to the next level: presentations written in Markdown! That would be wonderful. Unfortunately, the HTML output is a little bit too demanding for Jekyll (i.e. Maruku) to manage. It’s not quite extensible enough to pull it off. So it’s just HTML5 for now, which is unfortunately bulky when it comes to lists — a common element of presentations. Oh well. I do still get syntax highlighting with Pygments!

I haven’t used it for anything serious yet, so it’s still untried. In my experimentation I found it enjoyable to work with, so I really look forward to making use of it in the future. Feel free to use it yourself, of course, and tell me how it goes.

Why Do Developers Prefer Certain Kinds of Tools?

2012-04-29T00:00:00Z

In my experience, software developers generally prefer some flavor of programmer’s tools when it comes to getting things done. We like plain text, text editors, command line programs, source control, markup, and shells. In contrast, non-developer computer users generally prefer WYSIWYG word processors and GUIs. Developers often have somewhere between a distaste and a revulsion to WYSIWYG editors.

Why is this? What are programmers looking for that other users aren’t? What I believe it really comes down to is one simple idea: clean state transformations. I’m talking about modifying data, text or binary, in a precise manner with the possibility of verifying the modification for correctness in the future.

Think of a file produced by a word processor. It may be some proprietary format, like a Word’s old .doc format, or, more likely as we move into the future, it’s in some bloated XML format that’s dumped into a .zip file. In either case, it’s a blob of data that requires a complex word processor to view and manipulate. It’s opaque to source control, so even merging documents requires a capable, full word processor.

For example, say you’ve received such a document from a colleague by e-mail, for editing. You’ve read it over and think it looks good, except you want to italicize a few words in the document. To do that, you open up the document in a word processor and go through looking for the words you want to modify. When you’re done you click save.

The problem is did you accidentally make any other changes? Maybe you had to reply to an email while you were in the middle of it and you accidentally typed an extra letter into the document. It would be easy to miss and you’re probably not set up to easily to check what changes you’ve made.

I am aware that modern word processors have a feature that can show changes made, which can then be committed to the document. This is really crude compared to a good source control management system. Due to the nature of WYSIWYG, you’re still not seeing all of the changes. There could be invisible markup changes and there’s no way to know. It’s an example of a single program trying to do too many unrelated things, so that it ends up do many things poorly.

With source code, the idea of patches come up frequently. The program diff, given two text files, can produce a patch file describing their differences. The complimentary program is patch, which can take the output from diff and one of the original files, and use it to produce the other file. As an example, say you have this source file example.c,

int main()
{
    printf("Hello, world.");
    return 0;
}

If you change the string and save it as a different file, then run diff -u (-u for unified, producing a diff with extra context), you get this output,

--- example.c  2012-04-29 21:50:00.250249543 -0400
+++ example2.c   2012-04-29 21:50:09.514206233 -0400
@@ -1,5 +1,5 @@
 int main()
 {
+    printf("Hello, world.");
-    printf("Goodbye, world.");
     return 0;
 }

This is very human readable. It states what two files are being compared, where they differ, some context around the difference (beginning with a space), and shows which lines were removed (beginning with + and -). A diff like this is capable of describing any number of files and changes in a row, so it can all fit comfortably in a single patch file.

If you made changes to a codebase and calculated a diff, you could send the patch (the diff) to other people with the same codebase and they could use it to reproduce your exact changes. By looking at it, they know exactly what changed, so it’s not some mystery to them. This patch is a clean transformation from one source code state to another.

More than that: you can send it to people with a similar, but not exactly identical, codebase and they could still likely apply your changes. This process is really what source control is all about: an easy way to coordinate and track patches from many people. A good version history is going to be a tidy set of patches that take the source code in its original form and add a feature or fix a bug through a series of concise changes.

On a side note, you could efficiently store a series of changes to a file by storing the original document along with a series of relatively small patches. This is called delta encoding. This is how both source control and video codecs usually store data on disk.

Anytime I’m outside of this world of precision I start to get nervous. I feel sloppy and become distrustful of my tools, because I generally can’t verify that they’re doing what I think they’re doing. This applies not just to source code, but also writing. I’m typing this article in Emacs and when I’m done I’ll commit it to Git. If I make any corrections, I’ll verify that my changes are what I wanted them to be (via Magit) before committing and publishing them.

One of my longterm goals with my work is to try to do as much as possible with my precision developer tools. I’ve already got basic video editing and GIF creation worked out. I’m still working out a happy process for documents (i.e. LaTeX and friends) and presentations.

Moved To New Hosting

2011-08-05T00:00:00Z

If you see this post, then you’re receiving null program from its new host: GitHub. The end of the month marks 4 years of my blog (the last such announcement was two years ago), and what was going to be the 2-year renewal with my old host. I took the opportunity to reassess my hosting situation and reorganize my back-end.

The repository that houses my entire blog is here,

https://github.com/skeeto/skeeto.github.com

You can run an entirely local version of my blog from that repository. You could also make corrections, commit them, and give me a pull request — not that I’m expecting that to happen.

Initially I considered going in the opposite direction. I looked into renting out a VM, rebuilding my site from scratch as a Java Servlet, and hosting it from the VM. That would double my monthly costs, but give me a cool new thing to play with. It would also be a lot of work building a back-end from scratch.

Ultimately I wanted to work entirely within Git. I wanted to write a new post on my local machine, view it using a local web server, and, when I was satisfied, commit it and push it as a commit to the real server. My old situation had me checking out new posts by posting them live on the real server. I didn’t like that.

I eventually discovered GitHub Pages. GitHub will host your static content for free. Not only that, but they’ll process it with Jekyll. That means I can push “source” files to GitHub and they’ll “compile” them into my website for me. My blog is very well-suited for static hosting, so this is everything I needed. And the price is right: this is a free service!

Going back through each of my old posts, refitting them for Jekyll, was a very time consuming process. Every video, image, and highlighted bit of source code needed to be reworked. In addition to moving them into place for Jekyll, I improved the organization of my videos and images. At my old host, I highlighted source code with the htmlize Emacs extension. Jekyll has a much nicer solution: Pygments. However, this required stripping away all of the htmlize HTML code, reversing all of the HTML entities, and placing the Pygments delimiters; this was a process that couldn’t be automated because it required a lot of human judgment.

One thing I did notice from reworking all my source code samples, particularly more recent entries, was that most of my in-line code is Lisp! Common Lisp syntax highlighting was the most common, though it was really highlighting Elisp. My very early posts are dominated by Octave syntax highlighting.

Due to static hosting, for comments I went to Disqus. It was very easy and painless to integrate into my static blog. Unfortunately I haven’t yet looked into the import/export situation with my old comments. It looks like it should be possible to do. I can export to the WordPress XML format and import them into Disqus.

Update: I have now imported all of the old comments. The process went very smoothly and took very little effort on my part, since I had already been storing exactly the right information in the right form.

For comments, I actually considered the pull request thing. If you wanted to leave a comment, you would clone my blog’s repository, edit in a comment, and ask me to pull in your change. That’s a very large barrier to commenting, and I doubt anyone would do it, so I skipped that idea.

So this is my new blog situation. I’ve completely moved away from blosxom now. I trimmed some fat in the transition, so it currently weighs in at about 128MB. I’d like to keep it under 300MB, but that’s not overly important. Hopefully you can’t really tell the difference between the new and the old. The RSS feed is in the same place, so you don’t need to change anything to keep following. I did break my “short” and “long” permalinks but no one was using them anyway. Even though I’m now using GitHub, I’m not actually too dependent on them. My transformation to Jekyll gives me more freedom than ever. I could very easily move to any host at this point.

Distributed Issue Tracking

2009-02-14T00:00:00Z

In a previous post I discussed decentralized version control systems, Git in particular. Because decentralized version control is becoming so popular, we now have an exciting new area of development: distributed issue tracking.

Decentralized issue tracking seems to have popped into existance in the last year or so. A number of projects have appeared (cil, ticgit, ditz, to name some), but the one that really stands out for me is ditz. Keep an eye on that one. It's fairly active and mostly usable.

Decentralized trackers generally work by storing the issue tracking database within the repository itself. One possibility is to have it sit in its own branch, which I think is the Wrong Way. A second possibility is to have it sit right next to the code in its own directory. Yet another possibility is to put the issue tracker in its own repository. Git could even include this repository as a submodule (this is a lot like the Wrong Way, though).

First of all, everyone gets their own copy of the issue tracker database and its history. Second of all, it has history. It's tracked the same way the code is. And, in the second case usage, one of the coolest advantages is that issues follow the code very closely.

When a branch is created, it takes its own copy of the issue tracking database with it. If a bug is fixed in the main branch, the issue tracking database in the main branch is updated. The bug will remain in the side branch and the issue will still be open in the side branch reflecting this. If a merge occurs later, the issue tracking database also gets merged automatically. I think that's damn cool.

There are some issues that still need to be hammered out. How does a non-developer enter a ticker? They would need to work the version control system to do this, then be able to share that change. That's a pretty large barrier.

Perhaps a web interface could be set up for setting up issues. Developers could then cherry-pick/pull the issues from that repository and push ticket updates back out.

Then there is overhead incurred by moving tickets around with code. How bad is this overhead? How can this be dealt with in the most transparent way? This all needs to be tested still.

Could the issue database get too big? People like to attach screenshots to issues. Having many screenshots would make the repository very big. How do we deal with this?

It's an exciting, new realm to explore.

Git is Better

2009-02-12T00:00:00Z

I finally finished dumping the rest of my lingering Subversion repositories. I have converted them all to Git repositories. If you manage a Subversion (or CVS, or Perforce, etc) repository, you should consider doing the same. Git became my version control system (VCS) of choice in June and I haven't looked back since.

Why? Because Git is better.

Yes, it really is. Much better.

Git is faster, smaller, more secure, and more powerful. This is a virtue of decentralized version control systems. Subversion is Blub.

It all starts with to Source Code Control System (SCCS) and Revision Control System (RCS). These systems could only track single files and created headaches for projects with multiple files being worked on by multiple people.

Then came Concurrent Versions System (CVS), which improved things slightly, but still sucked. It still really only tracks individual files.

Now, CVS did anonymous reads, allowing anyone to access the repository and see code history. OpenBSD was the first code base to take advantage of this. These days, coming across projects that don't give public read access to their repository seems backwards.

Not using any of these systems would probably be better than using them. Their flaws are obvious as soon as you start using them.

Finally in the year 2000, Subversion arrives. It's a huge step up from CVS, fixing many of its problems. It has a much better interface and uses atomic commits — finally tracking more than one file at a time. We still need to talk to some server every time we want to do something. Branching and merging also sucks so much no one wants to use it. But branching is overrated, right? Wrong. I use branches all the time, now that they are easy.

The reason branching sucks in Subversion can be explained with a famous quote by Albert Einstein,

Make everything as simple as possible, but not simpler.

Instead of implementing tagging, branching, and merging, the Subversion guys just implemented "cheap copy". It's a pretty clever idea, but in practice it doesn't work out well. It's too simple.

CVS solves the wrong problem, and Subversion solves the right problem wrongly.

Since Subversion, a number of decentralized VCSs have arisen. We have GNU arch (2001), monotone (2002), darcs (2003), Bazaar (2005), Git (2005), Mercurial (2005), and fossil (2007). I played around with all of these when looking for a distributed VCS (except fossil) and none struck me the same way that Git did. I would recommend most of them over Subversion.

Distributed VCS has gotten a lot of attention in the past couple years, which had much to do with the Linux kernel switching to one (Git). In fact, Git and Mercurial were written precisely for this event. Since then, some major projects have been switching to Git, or at least to some sort of distributed VCS: Perl, Ruby on Rails, Android, WINE, Fedora, X.org, and VLC to name a few.

You can also see the chatter on the Internet about Git. It's is really popular with fresh, innovative projects, like the Arc programming language. It's pretty easy to accidentally run into various Git tutorials on the web. It has a real presence.

No Authority. But why distributed VCS? Why are they better?

First of all, when you "checkout" a distributed VCS, it's really a "clone" operation, which is what most of them call it. You get everything. After that, the only reason you need to talk upstream, which really isn't "up" anymore, is if either end has updates to the code they wish to share. The only way one clone might more important than another is human politics. Technically they are equals.

Small. But won't this be huge? A Subversion repository can easily be several gigabytes. That would be a lot to transfer on the initial clone.

Actually, distributed VCSs are extremely efficient. A Git clone will usually be smaller than a Subversion checkout. For example, I once cloned Freeciv's Subversion repository using Git (converting it to Git). It was about 15000 revisions. The bare version of the Git repository, containing all ~15000 commits, was half the size of the Subversion repository, which contained only a singly commit! The non-bare version was still smaller by a few megabytes. I can't even imagine how much space the server was using.

I would have some numbers on this example, but, alas, that clone was lost on a failed hard drive and it took me a week to make. Note, Git clones of Git repositories aren't that slow: Subversion isn't optimized for cloning, and the Freeciv Subversion server is extremely overloaded.

Update: I managed to get another clone, and it only took me a couple hours. The Freeciv Subversion checkout at revision 15574 is 281MB. Remember, this contains just one single revision. The Git clone after a repack and garbage collection, which contains all 15574 revisions, is 225MB. It's 56MB smaller! If I told it to leave out the Subversion metadata it would be even smaller than that. On the server side, the Subversion repository likely takes up gigabytes. And finally, to add insult to injury, the Git "bare" clone is 144MB.

Someone does have an example over here: Git's Major Features Over Subversion.

The Mozilla project's CVS repository is about 3 GB; it's about 12 GB in Subversion's fsfs format. In Git it's around 300 MB.

Git's packing format is fairly simple, yet so effective.

Fast. Well, duh. With everything being local, operations that work on multiple revisions will be fast. Beyond this, decentralized VCS is generally faster on all operations, except the initial clone.

Reduced politics. With a central repository, someone or some group has to decide who has write access and who doesn't. Developers without write access are basically stuck without version control, unless they hack in their own. In the decentralized model, everyone has write access to their own personal repository, and others can choose, on their own, to pull revisions from it.

Secure. A centralized VCS has a central, single point of failure. If that single point is compromised, the server needs to be restored from backups. Or worse, the compromise goes unnoticed and the repository history is modified without anyone ever being able to tell.

In a distributed model, each revision (and in Git's case, the files themselves) is referenced by a hash (SHA-1 in Git's case) of it's contents, a content-addressable storage system. Thanks to this, a file, no matter where it is in the tree or in history, is stored only once. The main purpose is to avoid collisions between revision identifiers on parallel lines of development. It also happens to make the repository tamper-proof.

If you know the revision ID of your HEAD no one will be able to change any of its history. This is because each revision contains the ID of its immediate ancestor, all the way back to the initial commit. If a previous commit changed, it would change the ID of every following commit. An attacker would have to find a desired collision for each one: simply impossible.

The hash addressing also provides integrity, as corruption in the repository is easily detected.

Another security gain, related to the reduced politics note, is the web of trust. This is the same way PGP handles key authentication. In a large project, a single developer may only trust a handful of people to be competent programmers, and therefore only pull from these developer's repositories. Those developers they pull from also have their own set of people they trust. In this way, revisions can safely be pulled from distant strange repositories through the web of trust.

The only reason to interact with a Subversion repository is for legacy reasons. Luckily, you don't have to use Subversion to use Subversion.

That wasn't a typo. Git has a Subversion/Git interfaced called git-svn. I used it to convert my Subversion repositories to Git, but it can be used as a fully functional Subversion client. It can clone the Subversion repository and continue to pull changes from it as it updates.

On your end, you can make commits to your local repository, use cheap branches, and so on,all of which stay local. Changes can be pushed back upstream to Subversion with the dcommit command, which would be done after rebasing any changes on top of the current Subversion HEAD. This provides most of the advantages of Git without worrying about having the central repository change.

One of the major complaints about Git is that it once lacked a plethora of GUIs, like CVS and Subversion have. Git does have GUIs. I looked at a couple of them out of curiosity, so I am not sure how good they are by comparison. I also have barely use any other VCS GUIs. The ones I have used I find incredibly annoying.

I don't understand why people insist on using them anyway. It's like using training wheels on a bike and claiming that it's better that way. No, those training wheels just get in the way.

To be brutally honest, if you don't want to use Git because you are afraid of the command line, what are you doing coding in the first place?

One topic left is issue tracking. In a centralized VCS, you have a centralized tracker. Subversion has Trac, for example. Well, what about distributed VCSs? They should have distributed issue tracking right?

I will go into this in my next post.