I added the fix and updated my e-mail address in the source tarball,
which
is in the
same location as before. There are no version numbers involved, so
I guess this is version "now" and the old one is version "then", with
names subject to change.
I have gotten several e-mails lately about using GNU Octave. One
specifically was about blurring images in Octave. In response, I am
writing this in-depth post to cover spatial filters, and how to use
them in GNU Octave (a free implementation of the Matlab programming
language). This should be the sort of information you would find near
the beginning of an introductory digital image processing textbook,
but written out more simply. In the future, I will probably be writing
a post covering non-linear spatial and/or frequency domain filters in
Octave.
If you want to follow along in Octave, I strongly recommend that you
upgrade to the new Octave 3.0. It is considered stable, but differs
significantly from Octave 2.1, which many people may be used to. You
will also need to install
the image
processing package
from Octave-Forge. To get
help with any Octave function, just type help
<function>.
The most common linear spatial image filtering
involves
convolving a filter mask, sometimes called a convolution
kernel, over an image, which is a two-dimensional matrix. In the
case of an RGB color
image, the image is actually composed of three two-dimensional
grayscale images, each representing a single color, where each is
convolved with the filter mask separately.
Convolution is sliding a mask over an image. The new value at the
mask's position is the sum of the value of each element of the mask
multiplied by the value of the image at that position. For an example,
let's start with 1-dimensional convolution. Define a mask,
5 3 2 4 8
The 2 is the anchor for the mask. Define an image,
0 0 1 2 1 0 0
As we convolve, the mask will extend beyond the image at the
edges. One way to handle this is to pad the image with 0's. We start
by placing the mask at the left edge. (zero-padding is underlined)
Mask: 5 3 2 4 8
Image: 0 0 0 0 1 2 1 0 0
The first output value is 8, as every other element of the mask is
multiplied by zero.
Output: 8 x x x x x x
Now, slide the mask over by one position,
Mask: 5 3 2 4 8
Image: 0 0 0 1 2 1 0 0
The output here is 20, because 8*2 + 4*1 = 20;
Output: 8 20 x x x x x
If we continue sliding the mask along, the output becomes,
Output: 8 20 18 11 13 13 5
Here is the correlation done in Octave interactively,
(filter2() is the correlation function).
The same thing happens in two-dimensional convolution, with the mask
moving in the vertical direction as well, so that each element in the
image is covered.
Sometimes you will hear this described as correlation
(Octave's filter2) or convolution
(Octave's conv2). The only difference between these
operations is that in convolution the filter masked is rotated 180
degrees. Whoop-dee-doo. Most of the time your filter is probably
symmetrical anyway. So, don't worry much about the difference between
these two. Especially in Octave, where rotating a matrix is easy
(see rot90()).
Now that we know convolution, let's introduce the sample image we will
be using. I carefully put this together
in Inkscape, which should give
us a nice scalable test image. When converting to a raster format,
there is a bit of unwanted anti-aliasing going on (couldn't find a way
to turn that off), but it is minimal.
Save that image (the PNG file, not the linked SVG file) where you can
get to it in Octave. Now, let's load the image into Octave
using imread().
m = imread("image-test.png");
The image is a grayscale image, so it has only one layer. The size
of m should be 300x300. You can check this like so (note
the lack of semicolon so we can see the output),
size(m)
You can view the image stored in m
with imshow. It doesn't care about the image dimensions
or size, so until you resize the plot window, it will probably be
stretched.
imshow(m);
Now, let's make an extremely simple 5x5 filter mask.
This filter mask is called an averaging filter. It simply
averages all the pixels around the image (think about how this works
out in the convolution). The effect will be to blur the image. It is
important to note here that the sum of the elements is 1 (or 100% if
you are thinking of averages). You can check it like so,
sum(f(:))
Now, to convolve the image with the filter mask
using filter2().
ave_m = filter2(f, m);
You can view the filtered image again with imshow()
except that we need to first convert the image matrix to a matrix of
8-bit unsigned integers. It is kind of annoying that we need this, but
this is the way it is as of this writing.
ave_m = uint8(ave_m);
imshow(ave_m);
Or, we can save this image to a file
using imwrite(). Just like with imshow(),
you will first need to convert the image to uint8.
imwrite("averaged.png", ave_m);
There are a few things to notice about this image. First there is a
black border around the outside of the filtered image. This is due to
the zero-padding (black border) done by filter2(). The
border of the image had 0's averaged into them. Second, some parts of
the blurred image are "noisy". Here are some selected parts at 4x zoom.
Notice how the circle, and the "a" seem a little bit boxy? This is due
to the shape of our filter. Also notice that the blurring isn't as
smooth as it could be. This is because the filter itself isn't very
smooth. We'll fix both these problems with a new filter later.
First, here is how we can fix the border problem: we pad the image
with itself. Octave provides us three easy ways to do this. The first
is replicate padding: the padding outside the image is the same as the
nearest border pixel in the image. Circular padding: the padding from
from the opposite side of the image, as if it was wrapped. This would
be a good choice for a periodic image. Last, and probably the most
useful is symmetric: the padding is a mirror reflection of the image
itself.
To apply symmetric padding, we use the padarray()
function. We only want to pad the image by the amount that the mask
will "hang off". Let's pad the original image for a 9x9 filter, which
will hang off by 4 pixels each way,
mpad = padarray(m, [4 4], "symmetric");
Next, we will replace the averaging filter with a 2D Gaussian
distribution. The Gaussian, or normal, distribution has many wonderful
and useful properties (as a statistics professor I had once said,
anyone who considers themselves to be educated should know about the
normal distribution). One property that makes it useful is that if we
integrate the Gaussian distribution from minus infinity to infinity,
the result is 1. The easiest way to get the curve without having to
type in the equation is using fspecial(): a special
function for creating image filters.
f_gauss = fspecial("gaussian", 9, 2);
This creates a 9x9 Gaussian filter with variance 2. The variance
controls the effective size of the filter. Increasing the size of the
filter from 9 to 99 will actually have virtually no impact on the
final result. It just needs to be large enough to cover the curve. Six
times the variance covers over 99% of the curve, so for a variance of
2, a filter of size 7x7 (always make your filters odd in size) is
plenty. A larger filter means a longer convolution time. Here is what
the 9x9 filter looks like,
Notice the extra argument "valid"? Since we padded the
image before filtering, we don't want this padding to be part of the
image result. filter2() normally returns an image of the
same size as the input image, but we only want the part that didn't
undergo (additional) zero-padding. The result is now the same size as
the original image, but without the messy border,
Also, compare the result to the average filter above. See how much
smoother this image is? If you are interested in blurring an image,
you will generally want to go with a Gaussian filter like this.
Now I will let you in on a little shortcut. In Matlab, there is a
function called imfilter which does the padding and
filtering in one step. As of this writing, the Octave-Forge image
package doesn't officially include this function, but it is there in
the source repository now, meaning that it will probably appear in the
next version of that package. I actually wrote my own before I found
this one. You can grab the official one
here:
imfilter.m
With this new function, we can filter with the Gaussian and save like
this. Notice the flipping of the first two arguments
from filter2, as well as the lack of converting
to uint8.
imfilter() will also handle the 3-layer color images
seamlessly. Without it, you would need to run filter2()
on each layer separately.
So that is just about all there is. fspecial() has many
more filters available including motion
blur,
unsharp, and edge detection. For example,
the Sobel edge
detector,
You may have noticed the new index available on the left of the
page. It really needs one, because looking at older posts required
paging through the archives. Even more so because I have yet to add
search functionality. There were
Blosxom plug-ins available to do these things, but they are long
gone now. I looked into getting a Google search box dropped in there,
but it requires a big advertisement to go around the box (and
Javascript... bleh!). I'll figure out something sometime.
Blosxom doesn't have a fancy database behind it, but rather plain flat
text files. I imagine that this doesn't scale well, but I haven't
looked that carefully into Blosxom's code. I should never be having
problems like that anyway as I doubt I will ever surpass even 1000
posts, nor will there ever be some huge flood of users. What
I do get is simplicity: to make a new post, I just write a text
file and drop it on the server. It also made writing the index code
very simple.
Without further ado, here is the code (under
the same license as
Blosxom):
blog-index.perl. As you can see, it doesn't integrate with the
rest of the blog; it only produces an index. I need function, not
form, so this is fine for me. You can use it on your own Blosxom
setup, just as long as you set the parameters at the top of the file
correctly.
The index is created dynamically, but only when it needs to be
updated. It first checks the date of the pre-generated index against
the date of the latest entry. If the entry is newer, the index is
rewritten. In both cases, this index is then served up efficiently
straight from a plain XHTML file. All the proper file locking is in
place (I think), so if there was a flood of requests when the index
needs to be rewritten, no two requests are trying to re-write the
index at the same time.
So nothing terribly exciting this week, but I hope someone finds this
code useful somewhere.
Introduction: This "news" is over two months old, simply because
I had other more interesting things to write about first. Not that I
am out of ideas: I have at least three more ideas lined up at the
moment on top of several half-written entries that may never see the
light of day. I just want to get it out of the way.
In December we held the robot competition, pitting against each other
the robots that
we spent the semester building. It was a double-elimination
bracket with five teams. Teams competed by arranging the maze (within
the rules) and deciding the initial position for their opponents. The
robots do not get to know about the maze or where they are starting;
they must figure this out on their own by exploring the maze.
To recap, there was an 8'x8' game area containing a 4'x8' maze of
1-square-foot cells. On the floor of the game area was a grid of white
lines on black, where the white lines were about 7-inches apart. The
robot started at an unknown position and orientation in the maze,
which was also set up with a configuration unknown to the robot. In
the non-maze open area, three small wooden blocks were placed at the
intersection of white lines, with a steel washer attached to the top
of each block.
In short, the robot had to move all three blocks to the repository, a
pre-programmed position in the maze.
At the end of the semester, our team's robot was the only one that
could successfully complete this task. The other teams needed to play
in a degraded mode: known maze configuration, known starting position,
known block positions. The loser bracket played this degraded version
of the game. Because of this, our team was able to sweep the
tournament with a perfect run. All the robot had to do was
successfully run the full game. The competition, not being able to do
this, automatically lost.
The robots were mostly the same, except for one team who had a robot
with 4 multi-direction wheels. Every other team made a "scooter bot"
type of robot: two powered wheels (with casters for balance) and
chassis with three levels. The first real separation of design was
when it came to picking up blocks. Each team initially had a different
idea. One team was going to build a pulley system to lift the
blocks. Another was going to use sweeping arms to sweep in the
block. Another was going to used a stationary magnet.
Our team went with a rotating wheel in front with magnets along the
outside (see images below). Once a block was found, the robot would
rotate a magnet over the block, then rotate the attached block out of
the way. In the end, four of the five teams ended up using this design
for their own robots (the last team stuck with the stationary magnet).
These pictures were taken about a month before the competition. The
wiring job was still a bit sloppy and the front magnet wheel lacks
tiny magnets attached to the outside. Other than that, this is what
our final robot looked like. In that last month, we attached the
magnets, cleaned up the wiring, and made a whole bunch of code
improvements making the robot more robust.
I will now attempt to describe some of the things you see in these
images.
On the bottom of the robot you can see two casters for balancing the
robot (big clunky things). You can see an IR sensor, which is pointing
at the blue surface attached to the other side of the robot. This was
the block detection sensor, a home-made break-beam sensor. And
finally, you can see three LED lights on top of a long circuit
board. This is a line tracker, with three sensors that can see the
white grid on the bottom of the game board. The line tracker is how
the robot navigated the open area of the board. It went back-and-forth
looking for blocks, using the line tracker to stay on the line.
Also attached to this bottom layer are the powered wheels, with blue
rubbers for traction, and their wheel encoders. There are spokes on
the inside of the wheels (encoder disks), and the wheel encoders send
a signal to the micro-controller each time it sees a spoke. The
software counts the number of spokes that passed, allowing the robot
to keep track of how far that wheel has turned. This information is
combined with IR distance sensors to give it a very accurate idea of
its position.
On top of the bottom black layer, you can see four distance IR sensors
for tracking walls in the maze. They checked to make sure the robot
was going straight (that's why there are two on each side), as well as
map out the maze as it travels long. Hanging down from the bottom of
the red layer is another IR sensor facing forward, looking at walls in
front of the robot. Mounted on the front is the block retrieval device
(lacking magnets at this point).
On top of the red layer are two (empty) battery packs, which holds 9
AA rechargeable NiMH batteries. This actually makes two separate power
systems: a 4-pack for motors and a 5-pack for logic (micro-controller
et al). In the circuit, the motors, containing coils of wire, behave
like inductors, which could cause harmful voltage spikes to the
logic. Separate power systems help prevent damage.
On top is the micro-controller and all of the important
connections. The vertical board contains the voltage regulator and
"power strip" where all of the sensors are attached. It also contains
the start button, which was connected to an interrupt in the
micro-controller. The micro-controller had its own restart button, but
once the system started up, initialized, and self-calibrated, it waited
for a signal from the start button to get things going.
I was about to post this when I was reminded by my fiancee that she
took pictures at the end-of-semester presentation, after the
competition. Here are some images of the robot after it was completely
finished. Yes, that is a little face fastened to the front.
If you are ever at Penn State and are visiting the IST building, you
can see the robot. Because the robot won the competition, it is on
display and will be for years to come. You can recognize it by its
face.
I have made the final robot code available
here:
final-robot-code.zip. I was the software guy, handling pretty much
all the code, so everything here,
except interupt_document.c was written by me. It's
probably not very useful as code, except for reading and learning how
our robot worked. There are a few neat hacks in there, though, which I
may discuss as posts here. It's not noted in the code itself, nor in
the zip file, but I'll make this available under my
favorite 2-clause BSD
license.
Don't stop here! This isn't everything. Check out the archives
(on the left) for more posts. Or just have a look at
the index.