wbf2c - (Wellons) Brainfuck to C

Optimizing, Multi-threading Brainfuck to C Converter

wbf2c is an optimizing brainfuck to C converter that can also do compilation in the background automatically. Currently the converter compacts brainfuck commands and performs loop unrolling by detecting "cell copy" operations. This loop unrolling can provide an amazing speed boost for many brainfuck programs.

Features

The converter supports different cell type: 8-bit char (0 to 256), 16-bit short (0 to 65536), 32-bit int (0 to 4294967296), and bignum (unlimited range, but no wrapping). Bignum is accomplished using the GNU Multi-precision library, GMP, the fastest bignum library there is.

By default, the memory is dynamically sized allowing for non-constant memory algorithms to accept arbitrarily large input (such as for sorting). Because this incurs a speed penalty, this can be disabled and the memory defined to a fixed size beforehand.

Also a useful feature for debugging is a core dump. A core dump can optionally be added so that when a program terminates, either naturally or from a SIGINT (C-c), its memory is dumped to a file for inspection. Bounds checking can also be done at run time, allowing a program to abort giving line number information instead of going off and causing segfault headaches.

Multi-threading

The idea is that you provide two (or more) separate programs, each will have its own pointer but work on the same memory array. These two programs run as different threads. In the case of fixed memory, every cell has a mutex lock. This is all done without extending or modifying the language. It is achieved using POSIX threads (Pthreads).

The converter does not yet recognize special patterns in the language. For example, a [] would indicate that the thread should sleep until the value at the current cell is zero rather than spinlock wasting CPU cycles. This will be added in the future.

If you want to see the threading in action, try this example. Here are two files printa.bf and newline.bf,

printa_bf
++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++
[.]

and,

newline_bf
>++++++++++
[.]

Compile with threading,

./wbf2c -cH printa.bf newline.bf -o example

The output will be lines of a's with different lengths (including lines with 0 a's). What this looks like will depend on your operating system's kernel's scheduler as well as your hardware. On the linux 2.6.21 kernel I get very long lines with a's separated by hundreds of blank lines. On SunOS on a Sun Fire V880, I get something much more even like this,

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaa

Here is an idea, you can use this to attempt to obtain a random seed for a random number generator, allowing a program that accepts no input to produce different output on each run.

Download

I haven't packaged wbf2c yet as there is no documentation written. You can get the current version from the Git repository with this command,

git clone http://git.nullprogram.com/bfc.git

Or you can just download the latest snapshot: http://git.nullprogram.com/bfc.git

Then to build just execute the bootstrap script (requires autoconf and automake).

  ./bootstrap
  ./configure
  make

The code base is in a transition at the moment. I began the project because I needed a brainfuck compiler. I wanted one that could run the brainfuck programs very quickly on my GNU/Linux system as well as my school's SunOS machines. I did not find one already written (after a *very* brief search) that could do this. After I started growing the program to do more, I decided to invent a brainfuck intermediate language that allows me to run optimizations on a more expressive language internally (this is what permits loop unrolling).

This intermediate code could be used to easily add more languages other than C. I have no intention to do this, however, because on goal of the project is to run brainfuck programs quickly. Converting to Ruby, for example, wouldn't be very fast.

Examples

This is the current --help output,

Usage: ./wbf2c [OPTION ...] FILES

Options:

  -b, --bounds-err      Error on pointer outside of boundaries at run time
  -s, --static-mem      Memory size, number of cells (30000)
  -g, --mem-grow-rate   Dynamic memory grow rate (2)
  -o, --output          Select output file
  -O, --optimize        Optimize compiled code (C compiler)
  -d, --dump            Dump memory core after run
  -C, --comments        Pass comments back out
  -c, --compile         Send output to C compiler
  -n, --no-optimize     Don't perform brainfuck optimization
  -V, --version         Print program version
  -h, --help            Print this help information

Cell Types:

  char         Unsigned char   (0 to 256)
  short        Unsigned short  (probably 0 to 65536)
  int          Unsigned int    (probably 0 to 4294967296)
  bignum       Multi-precision (unlimited range, no wrapping, requires gmp)

To compile a brainfuck program instead of simply outputting C code use the -c switch,

./wbf2c -c program.bf -o program

The default options define the second fastest configuration. If you want to go faster, disable dynamic memory with the -s switch,

./wbf2c -cs program.bf -o program

When doing core dumps, the entire memory array is dropped even if most of it was empty and untouched. If you are using dynamically sized memory (default), defining the memory to be of size 1 will cause the memory to grow as needed and the core will only be slightly larger than the used memory.

./wbf2c -cdm 1 program.b -o program

Ideas

There are a few ideas I have had for interesting features. I will attempt to add these features as I develop the converter.

Built-in debugging

Once your program is built, running it with a debugging flag (-d perhaps?) would cause the program to run can ncurses front end to debug the program. You could then set breakpoints and step the program through execution. This would hurt normal performance and limit compilation to systems with ncurses development set up. Therefore, this would only be enabled via a switch.

Precalculated memory and output, infinite loop detection

The converter could execute the program up until the first input (,) and start the compiled program in this state. For programs that accept no input, the entire program could be executed to termination and the final compiled program simply outputs a string.

To prevent the converter from locking up, we can detect infinite loops and not precalculate those. In fact, when we are using a type that wraps (anything but bignum), we can always detect infinite loops, even if there is user input. An infinite loop occurs when a loop does not modify its base cell (even for unwrapped types). If the base cell is modified, the loop will always terminate (due to wrapping).

Valid XHTML 1.0 Transitional