null program

Readline Wrap

I came across a very interesting tool the other day called rlwrap. It wraps the readline library over just about any interactive text input program. The readline library provides basic editing and history. It's handy for those programs that don't provide their own line editing facilities.

It tries to be as transparent as possible, detecting yes/no prompts and passwords, so it should still be reasonable under those conditions.

If you can't think of anything to try it with, try it with cat. Instant line editor!

rlwrap cat > some-file.txt

Or with Festival.

rlwrap festival --tts

It will also turn incorrectly compiled shells on your system into something usable. On my system (Debian GNU/Linux), csh isn't usable without rlwrap.

rlwrap csh

Distributed Issue Tracking

Oublic domain tacks image In a previous post I discussed decentralized version control systems, Git in particular. Because decentralized version control is becoming so popular, we now have an exciting new area of development: distributed issue tracking.

Decentralized issue tracking seems to have popped into existance in the last year or so. A number of projects have appeared (cil, ticgit, ditz, to name some), but the one that really stands out for me is ditz. Keep an eye on that one. It's fairly active and mostly usable.

Decentralized trackers generally work by storing the issue tracking database within the repository itself. One possibility is to have it sit in its own branch, which I think is the Wrong Way. A second possibility is to have it sit right next to the code in its own directory. Yet another possibility is to put the issue tracker in its own repository. Git could even include this repository as a submodule (this is a lot like the Wrong Way, though).

First of all, everyone gets their own copy of the issue tracker database and its history. Second of all, it has history. It's tracked the same way the code is. And, in the second case usage, one of the coolest advantages is that issues follow the code very closely.

When a branch is created, it takes its own copy of the issue tracking database with it. If a bug is fixed in the main branch, the issue tracking database in the main branch is updated. The bug will remain in the side branch and the issue will still be open in the side branch reflecting this. If a merge occurs later, the issue tracking database also gets merged automatically. I think that's damn cool.

There are some issues that still need to be hammered out. How does a non-developer enter a ticker? They would need to work the version control system to do this, then be able to share that change. That's a pretty large barrier.

Perhaps a web interface could be set up for setting up issues. Developers could then cherry-pick/pull the issues from that repository and push ticket updates back out.

Then there is overhead incurred by moving tickets around with code. How bad is this overhead? How can this be dealt with in the most transparent way? This all needs to be tested still.

Could the issue database get too big? People like to attach screenshots to issues. Having many screenshots would make the repository very big. How do we deal with this?

It's an exciting, new realm to explore.


Git is Better

I finally finished dumping the rest of my lingering Subversion repositories. I have converted them all to Git repositories. If you manage a Subversion (or CVS, or Perforce, etc) repository, you should consider doing the same. Git became my version control system (VCS) of choice in June and I haven't looked back since.

Why? Because Git is better.

Yes, it really is. Much better.

Git is faster, smaller, more secure, and more powerful. This is a virtue of decentralized version control systems. Subversion is Blub.

It all starts with to Source Code Control System (SCCS) and Revision Control System (RCS). These systems could only track single files and created headaches for projects with multiple files being worked on by multiple people.

Then came Concurrent Versions System (CVS), which improved things slightly, but still sucked. It still really only tracks individual files.

Now, CVS did anonymous reads, allowing anyone to access the repository and see code history. OpenBSD was the first code base to take advantage of this. These days, coming across projects that don't give public read access to their repository seems backwards.

Not using any of these systems would probably be better than using them. Their flaws are obvious as soon as you start using them.

Finally in the year 2000, Subversion arrives. It's a huge step up from CVS, fixing many of its problems. It has a much better interface and uses atomic commits -- finally tracking more than one file at a time. We still need to talk to some server every time we want to do something. Branching and merging also sucks so much no one wants to use it. But branching is overrated, right? Wrong. I use branches all the time, now that they are easy.

The reason branching sucks in Subversion can be explained with a famous quote by Albert Einstein,

Make everything as simple as possible, but not simpler.

Instead of implementing tagging, branching, and merging, the Subversion guys just implemented "cheap copy". It's a pretty clever idea, but in practice it doesn't work out well. It's too simple.

CVS solves the wrong problem, and Subversion solves the right problem wrongly.

Since Subversion, a number of decentralized VCSs have arisen. We have GNU arch (2001), monotone (2002), darcs (2003), Bazaar (2005), Git (2005), Mercurial (2005), and fossil (2007). I played around with all of these when looking for a distributed VCS (except fossil) and none struck me the same way that Git did. I would recommend most of them over Subversion.

Distributed VCS has gotten a lot of attention in the past couple years, which had much to do with the Linux kernel switching to one (Git). In fact, Git and Mercurial were written precisely for this event. Since then, some major projects have been switching to Git, or at least to some sort of distributed VCS: Perl, Ruby on Rails, Android, WINE, Fedora, X.org, and VLC to name a few.

You can also see the chatter on the Internet about Git. It's is really popular with fresh, innovative projects, like the Arc programming language. It's pretty easy to accidentally run into various Git tutorials on the web. It has a real presence.

No Authority. But why distributed VCS? Why are they better?

First of all, when you "checkout" a distributed VCS, it's really a "clone" operation, which is what most of them call it. You get everything. After that, the only reason you need to talk upstream, which really isn't "up" anymore, is if either end has updates to the code they wish to share. The only way one clone might more important than another is human politics. Technically they are equals.

Small. But won't this be huge? A Subversion repository can easily be several gigabytes. That would be a lot to transfer on the initial clone.

Actually, distributed VCSs are extremely efficient. A Git clone will usually be smaller than a Subversion checkout. For example, I once cloned Freeciv's Subversion repository using Git (converting it to Git). It was about 15000 revisions. The bare version of the Git repository, containing all ~15000 commits, was half the size of the Subversion repository, which contained only a singly commit! The non-bare version was still smaller by a few megabytes. I can't even imagine how much space the server was using.

I would have some numbers on this example, but, alas, that clone was lost on a failed hard drive and it took me a week to make. Note, Git clones of Git repositories aren't that slow: Subversion isn't optimized for cloning, and the Freeciv Subversion server is extremely overloaded.

Update: I managed to get another clone, and it only took me a couple hours. The Freeciv Subversion checkout at revision 15574 is 281MB. Remember, this contains just one single revision. The Git clone after a repack and garbage collection, which contains all 15574 revisions, is 225MB. It's 56MB smaller! If I told it to leave out the Subversion metadata it would be even smaller than that. On the server side, the Subversion repository likely takes up gigabytes. And finally, to add insult to injury, the Git "bare" clone is 144MB.

Someone does have an example over here: Git's Major Features Over Subversion.

The Mozilla project's CVS repository is about 3 GB; it's about 12 GB in Subversion's fsfs format. In Git it's around 300 MB.

Git's packing format is fairly simple, yet so effective.

Fast. Well, duh. With everything being local, operations that work on multiple revisions will be fast. Beyond this, decentralized VCS is generally faster on all operations, except the initial clone.

Reduced politics. With a central repository, someone or some group has to decide who has write access and who doesn't. Developers without write access are basically stuck without version control, unless they hack in their own. In the decentralized model, everyone has write access to their own personal repository, and others can choose, on their own, to pull revisions from it.

Secure. A centralized VCS has a central, single point of failure. If that single point is compromised, the server needs to be restored from backups. Or worse, the compromise goes unnoticed and the repository history is modified without anyone ever being able to tell.

In a distributed model, each revision (and in Git's case, the files themselves) is referenced by a hash (SHA-1 in Git's case) of it's contents, a content-addressable storage system. Thanks to this, a file, no matter where it is in the tree or in history, is stored only once. The main purpose is to avoid collisions between revision identifiers on parallel lines of development. It also happens to make the repository tamper-proof.

If you know the revision ID of your HEAD no one will be able to change any of its history. This is because each revision contains the ID of its immediate ancestor, all the way back to the initial commit. If a previous commit changed, it would change the ID of every following commit. An attacker would have to find a desired collision for each one: simply impossible.

The hash addressing also provides integrity, as corruption in the repository is easily detected.

Another security gain, related to the reduced politics note, is the web of trust. This is the same way PGP handles key authentication. In a large project, a single developer may only trust a handful of people to be competent programmers, and therefore only pull from these developer's repositories. Those developers they pull from also have their own set of people they trust. In this way, revisions can safely be pulled from distant strange repositories through the web of trust.

The only reason to interact with a Subversion repository is for legacy reasons. Luckily, you don't have to use Subversion to use Subversion.

That wasn't a typo. Git has a Subversion/Git interfaced called git-svn. I used it to convert my Subversion repositories to Git, but it can be used as a fully functional Subversion client. It can clone the Subversion repository and continue to pull changes from it as it updates.

On your end, you can make commits to your local repository, use cheap branches, and so on,all of which stay local. Changes can be pushed back upstream to Subversion with the dcommit command, which would be done after rebasing any changes on top of the current Subversion HEAD. This provides most of the advantages of Git without worrying about having the central repository change.

One of the major complaints about Git is that it once lacked a plethora of GUIs, like CVS and Subversion have. Git does have GUIs. I looked at a couple of them out of curiosity, so I am not sure how good they are by comparison. I also have barely use any other VCS GUIs. The ones I have used I find incredibly annoying.

I don't understand why people insist on using them anyway. It's like using training wheels on a bike and claiming that it's better that way. No, those training wheels just get in the way.

To be brutally honest, if you don't want to use Git because you are afraid of the command line, what are you doing coding in the first place?

One topic left is issue tracking. In a centralized VCS, you have a centralized tracker. Subversion has Trac, for example. Well, what about distributed VCSs? They should have distributed issue tracking right?

I will go into this in my next post.


Creating Simple Dice with GIMP

Final image.

In my previous article I drew those red dice myself, using GIMP. Since I really enjoyed figuring out how to do it, and actually doing it, here is a little tutorial.

The numbers and sizes are arbitrary, so feel free to adjust things if you think they look better. I am no artist. I am sure someone could take this further to make it look better, perhaps by making the pips look indented, or adding some transparency effect so the dice look clear. I am a GIMP newbie.

Step 1: Make a Face

In GIMP, create a new 300x300 image and fill it with a dark red. I used c30808 for this. This will be the base color of the dice, so if you want differently colored dice, choose whatever color you like.

Plain 300x300 image.

Step 2: Make Pips

Next we use the ellipse selection tool (e) to make pips. In the settings, set the ellipses to a fixed size of 75x75.

Ellipse size fix.

Create the ellipse and move it to the upper left-hand corner. Use the arrow keys to nudge it to position (5, 5) -- or just type in these values.

Pip selection.

Use bucket fill (shift+b) to fill the selected area with white, or whatever color you want your pips to be. Keep doing this to make pips in each corner. The positions should be (5, 220), (220, 5), and (220, 220). This makes the 4 face. Put a fifth pip in the middle, (112, 112), to turn it into a 5 face.

5 pips

Step 3: Make More Faces

You now have one face of your die. The 1, 2, 3, and 4 faces are the same as the 5 face, but fewer pips. In the layers dialog name the current layer "5". Now, duplicate the layer (shift+ctrl+d) and name this new layer 4. Use either the paintbrush tool (p) to paint your base color over the middle pip, or use the selection tools to remove it.

Keep duplicating layers and removing pips until you have 5 faces: 1, 2, 3, 4, and 5.

Layers dialog showing 5 faces.

Duplicate the "4" layer and create two more pips to make the 6 face. You now have 6 layers, each containing a single face. Here is my .xcf when I was done: dice-faces.xcf.

Step 4: Map it to a Cube

Now comes the fun part, the real guts of the drawing. You are going to map these layers onto a cube. Go to Filters -> Map -> Map Object. Map to "Box" and select Transparent background and Create new image.

The mapping interface.

Under the Orientation tab adjust the rotation. For the first die, try something like (20, 40, -5). If you enable Show preview wireframe you can see your adjustments live. Just don't make these values too high or it will make the next step more difficult.

Under the Box tab set the Front, Top, and Left faces to different layers. Note that the opposite sides of a die always add up to 7. That is, 1 is opposite to 6, 2 is opposite 5, and 3 is opposite 4. Here is how a typical die looks.

Die faces.

If you are really picky, you might want to pay attention to the orientation of the 3's, 2's, and 6's and flip those layers accordingly.

Hit Preview! to see your work. If you are happy, click OK. Autocrop the new image with Image -> Autocrop Image.

A single die.

Step 5: Make More Dice

Do this a few more times with different faces at different orientations. I will make just one more for the example.

Create a new 640x480 image with transparent background. Copy and paste your dice into this image. After each paste, make a new layer (shift+ctrl+n), so each die gets its own layer. Use the Move tool (m) to adjust the dice into a sort-of mid-roll. Whatever looks good.

Dice are positioned.

Step 5: Make Shadows

The last part left is the shadow. First, merge the visible layers (shift+m), then duplicate the remaining layer. Call this new layer "Shadow".

Go to Colors -> Brightness-Contrast. Set contrast to -127. This will be the shadow. If you want a darker or lighter shadow, open the same dialog again and adjust the brightness. Next scale the shadow layer vertically by 50%. You want the width to remain the same.

Select the Sheer tool (shift+s) and sheer the layer in the X direction -100 pixels. Move the shadow layer to the bottom. Now use the Move tool (m) to move the shadow into an appropriate position.

You can add a penumbra by applying a Gaussian blur to the shadow layer: Filters -> Blur -> Gaussian Blur. I blurred mine by 5 pixels.

Finally, you might want to autocrop the layers, then fit the image canvas to the layers, which will get rid of the excess border.

Final step.


Diceware Passphrases

Casino dice I GIMPed Diceware is a method of easy-to-remember, easy-to-type, secure passphrase and password generation. It works completely off-line and requires no computer whatsoever, apart from retrieving the Diceware list. By taking the passphrase generation off-line there is less room for mistakes to be made.

The reason these password are easy to remember is that they are simply a series of words in your native language. This also tends makes them easier to type without lots of practice as you should already be used to typing words.

Because the official Diceware website is frequently down or unusable, I have mirrored the original page here,

http://nullprogram.com/download/diceware/diceware.html

You can grab the word lists directly,

Diceware Word List
Beale's Diceware Word List

The lists are cryptographically signed by Arnold G. Reinhold so you can verify that I have not tampered with it. I must also note that, unfortunately, the author requires that this list only be distributed non-commercially, which limits its usefulness but allows me to distribute it here.

I also came across another list called DialDice, which I have mirrored here and signed with my own key,

DialDice Word List

Diceware works by rolling five 6-sided dice (or rolling one 6-sided die five times, etc.) and using the result to look up the word in one of the above lists of 6^5, or 7776, words. Each word is worth about 12.9 bits of entropy,

log2(7776) =~ 12.9248

So if you want a password worth about 40 bits -- which is about 7 letters of a random alphanumeric password -- you would generate three Diceware words. They can be concatenated in any order and in any fashion. When I use Diceware, I just mash them together, like "lancealertgrow". Note that the space bar makes a distinctive sound when pressed, so if you put spaces between your words a listener will be able to tell how many words you use.

If you don't like what you rolled the first time, DO NOT generate a new one as an attempt to get someting "better". If you do this, you will greatly weaken your passwords because you are selecting passwords from a much smaller pool of possibilities (a very small pool that contains only passwords you like).

The number of possible three-word passwords is 470,184,984,576. That's right: 470 billion. Because you are selecting passwords with dice, each password is as equally likely as the next. Even if an attacker knew you used Diceware and knew what word list you used, that still leaves a handful of guesses out of that 470 billion possibilities.

At first it may be confusing, but it actually doesn't matter what the words are or how long they are. It doesn't matter that there are no capitals or special characters. It is the simple fact that there are 7776 words, and one was selected three times.

7776 * 7776 * 7776 = 470184984576

The Diceware website goes into a bit more detail on this.

If the computer system you use annoyingly requires passwords to contain special characters (which is done to increase entropy in passwords that actually are poor), Diceware also provides a method of adding some of those, which adds a couple more bits worth of entropy. If you don't care about those extra bits, you can throw your own in.

For passphrases, Diceware recommends 6 words, about 77.5 bits, which it claims should be out of range of brute-force attacks from anyone for at least the next 20 years. If you really think you need more than 6 words, you should consider hiring guards for all your computer equipment.

I was working on my own Diceware word list to release into the public domain. The purpose was to provide a word list without any distribution restrictions, unlike all the Diceware lists I have found. But making word lists is hard! I wrote a number a little filters -- word length, no sub-words, spell checking, no special characters, etc -- to pull out good words from a large word list, then used some sample English text from Wikipedia to get some frequency information so that I am was selecting common words over less common words. I still need to go over the final list by hand to make sure it all looks good. This is a long tedious process. Carefully examining 7776 words is quite a lot of work. Someday I will finish it.

In the event that you don't have any dice available, or you want to be able to generate Diceware passwords on the fly automatically, I have written a little program that will use /dev/random, assuming there is a true RNG behind it, to roll virtual dice. It can use a Diceware word list or your local dictionary word list which contains many more words (usually found at /usr/share/dict/words). Grab it here,

passgen.pl

To get help, just run it with --help. So, to generate a two word password using the local dictionary,

$ ./passgen.pl -w2
Bits per word: 14.4421011596755
Key length: 28.8842023193511
grisly cog

It also tells you how many bits the password is worth.

I highly recommend Diceware for your password and passphrase generation.


Don't stop here! This isn't everything. Check out the archives (on the left) for more posts. Or just have a look at the index.