<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged perl at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/perl/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/perl/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:dd44fb39-63bf-4290-8e5f-818e34348c1c</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Torrent File Strip</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2011/02/19/"/>
    <id>urn:uuid:fa638343-71a9-3c3c-8e31-f7f8d83b55b4</id>
    <updated>2011-02-19T00:00:00Z</updated>
    <category term="perl"/>
    <content type="html">
      <![CDATA[<!-- 19 February 2011 -->
<p class="abstract">
You can skip my explanation and download the tool here,
</p>
<pre>
git clone <a href="https://github.com/skeeto/btstrip">git://github.com/skeeto/btstrip.git</a>
</pre>
<p>
My main computer recently stopped working for me, and, until I have a
replacement, I've been using my wife's old 2004-era desktop, with only
500MB of memory. That has required me to make use of a lighter-weight
BitTorrent client than I have in the
past: <a href="http://libtorrent.rakshasa.no/">rTorrent</a>.
</p>
<p>
Because rTorrent's interface is built on ncurses, when combined
with <a href="/blog/2009/03/05/">GNU Screen</a> it behaves very much
like a daemon. I have configured it to watch a certain directory for
new torrent files. When I want to start downloading a new torrent, I
just put the torrent file there and rTorrent gets to work on it. Share
this watched directory on a network, and rTorrent becomes a network
service.
</p>
<pre>
# rTorrent configuration
directory = /torrents/
session = /torrents/.session/
schedule = watch_directory,5,5,load_start=/torrents/watch/*.torrent
schedule = untied_directory,5,5,stop_untied=
schedule = tied_directory,5,5,start_tied=
</pre>
<p>
Unfortunately, the rTorrent documentation has not been kept up to
date, and appears to be inaccurate. I prefer to rely completely on
the distributed hash table (DHT) rather than use normal trackers
(<a href="/blog/2009/10/26/">so long as the torrent is free of
DRM</a>). My last BitTorrent client allowed me to do this, but the
documented rTorrent option for doing
this, <code>enable_trackers</code>, doesn't seem to work at all.
</p>
<pre>
enable_trackers = no
</pre>
<p>
My current workaround is to strip away all of the tracker information
from the torrent file before giving it to rTorrent. It's a Perl
script, with all the hard work being done by
the <a href="http://search.cpan.org/dist/Bencode/lib/Bencode.pm">
Bencode module</a>. Stripping out the trackers is trivial,
</p>
<figure class="highlight"><pre><code class="language-perl" data-lang="perl"><span class="k">my</span> <span class="nv">$decoded</span> <span class="o">=</span> <span class="nv">bdecode</span> <span class="k">do</span> <span class="p">{</span> <span class="nb">local</span> <span class="vg">$/</span><span class="p">;</span> <span class="o">&lt;</span><span class="bp">STDIN</span><span class="o">&gt;</span> <span class="p">};</span>
<span class="nv">$$decoded</span><span class="p">{'</span><span class="s1">announce</span><span class="p">'}</span> <span class="o">=</span> <span class="p">"</span><span class="s2">http://127.0.0.1/</span><span class="p">";</span>
<span class="nb">delete</span> <span class="nv">$$decoded</span><span class="p">{'</span><span class="s1">announce-list</span><span class="p">'};</span>
<span class="k">print</span><span class="p">(</span> <span class="nv">bencode</span> <span class="nv">$decoded</span> <span class="p">);</span></code></pre></figure>
<p>
The encoding of a torrent file is of critical importance, due to
the <code>info_hash</code>: the hash of all of the torrent data and
metadata, which uniquely identifies that torrent. If any aspect of
the <code>info</code> field in the torrent file changes, you get a
different hash and will not be able to participate in the original
torrent. The reason the above code guaranteed to work properly is
because, for any possible bencode data structure, there
is <i>exactly</i> one possible way to bencode it.
</p>
<p>
There are four data types in the bencode format: integers, byte
strings, lists, and associative arrays. Integers are encoded in ASCII
as base-10 (making it unaffected by endianess). Byte strings are
stored as an integer indicating its length, followed by the literal
string itself. Lists are stored as a list of objects in order, with
sentinel. And associative arrays are stored as a list of key
pairs. Most importantly, those key pairs are stored in alphabetical
order, enforcing a single encoding for any given associative array.
</p>
<p>
The simple code above only works on stdin to stdout, but it would be
more convenient to edit torrent files in place. So I buffed it up a
bit in my working version. It has two command line switches: one for
operating on the file in-place (<code>-i</code>), and one for setting
the tracker URL manually, rather than inserting a dummy value
(<code>-t</code>). It also strips out some of the extra, optional
fields, cutting the torrent file down to its minimal size.
</p>
<p>
Once you've installed the Bencode module, drop that script in
your <code>PATH</code> somewhere and you're ready to go.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>E-mail Obfuscater Perl One-liner</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/06/02/"/>
    <id>urn:uuid:9498c4a0-8f5c-37a3-b443-ccc189dbbcbd</id>
    <updated>2009-06-02T00:00:00Z</updated>
    <category term="perl"/>
    <content type="html">
      <![CDATA[<!-- 2 June 2009 -->
<p>
If you look at the page sources around here you might notice that
there are no bare e-mail addresses around. This is because I obfuscate
them into a series of HTML entities. So far this has been pretty
effective at hiding from address-scraping, web-crawling spam
bots. They don't seem to try very hard at decoding HTML entities.
</p>
<p>
When I added the comment system, I needed to obfuscate addresses
automatically. I quickly realized that this is yet another perl
one-liner (and implemented as one line in the comment system). It can
be used on the command line to obfuscate a file/pipe containing a list
of addresses.
</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">perl <span class="nt">-lpe</span> <span class="s1">'$_ = join "", map {"&amp;#" . ord() . ";"} split //'</span></code></pre></figure>
<p>
All of the spaces are really only there for us humans,
</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">perl <span class="nt">-lpe</span> <span class="s1">'$_=join"",map{"&amp;#".ord.";"}split//'</span></code></pre></figure>
<p>
I keep running into these one-liners.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Another Perl One-liner: Byte Order</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/20/"/>
    <id>urn:uuid:cc019ed2-b329-333e-1625-2f0f8cd385fd</id>
    <updated>2009-05-20T00:00:00Z</updated>
    <category term="perl"/>
    <content type="html">
      <![CDATA[<!-- 20 May 2009 -->
<p>
At work right now I am using two different machines. One is a 32-bit
SPARC and the other is a very powerful SMP x86-64. Sometimes data are
generated on one machine and used in a simulation on the other. There
is a problem of byte-order, though. The SPARC is big-endian and the
other is little-endian, and the programs on both sides don't pay
attention to that small detail.
</p>
<p>
Luckily, the data are all 4-byte aligned. That's perfect for a Perl
one-liner byte order conversion,
</p>
<pre>
perl -e 'print scalar reverse while read STDIN, $_, 4' &lt; in.le &gt; out.be
</pre>
<p>
Perl is really great for concise hacks. I really like how this
one-liner almost reads like a natural language sentence. Is there any
other language that can do powerful one-liners like Perl?
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>SWF Decompression Perl One-liner</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/04/18/"/>
    <id>urn:uuid:dfb62762-6499-323d-093f-386400407b28</id>
    <updated>2009-04-18T00:00:00Z</updated>
    <category term="perl"/>
    <content type="html">
      <![CDATA[<!-- 18 April 2009 -->
<p>
<img src="/img/misc/magnify.png" alt="" class="right"/>

Flash seems to be the popular way of playing videos online. This is a
bit better than the bad old days of online video where a user had to
select from a few buggy media player plug-ins. Things have improved.
</p>
<p>
However, if you don't use Flash, or if you want to watch the videos in
your own media player, you are stuck. A download link for the video is
almost never provided. The video is always somewhere, though, to be
fetched via http. I <a href="/blog/2007/09/05">mentioned this
before</a> for downloading YouTube videos using <a
href="http://rg3.github.com/youtube-dl/"> youtube-dl</a>.
</p>
<p>
The trick is finding the URL. Sometimes you can derive it from the
HTML code, sometimes you have to dig a little deeper by inspecting the
Flash player itself. <a
href="http://en.wikipedia.org/wiki/Strings_(Unix)"><code>
strings</code></a> can be invaluable here.
</p>
<p>
There could be an extra layer of stuff to work out, which is explained
below. My main reason for posting this is so I can refer back to it
later when I need to do it again.
</p>
<p>
So, the other day I ran into a Flash video player that contained part
of the URL of its video. I began by studying the <code>embed</code>
tag in the HTML, which gave me some information about where to find
the video (the video ID number). I downloaded the Flash player SWF
file for the purpose of running <code>strings</code> on it.
</p>
<p>
I ran into a problem here. I wasn't finding any non-garbage strings
inside the file. <code>file</code> told me it was <i>compressed</i>.
</p>
<pre>
$ file player.swf
player.swf: Macromedia Flash data (compressed), version 9
</pre>
<p>
Searching online quickly revealed that a compressed Flash file is just
zlib compression after an 8-byte header. Decompression can actually be
done with a Perl one-liner,
</p>
<pre>
perl -MCompress::Zlib -0777 -e \
      'print uncompress substr &lt;&gt;, 8;' player.swf &gt; player
</pre>
<p>
I ran <code>strings</code> and <code>grep</code>ed for "http",
revealing the location of the video. That was it!
</p>
<p>
I actually came across <a
href="http://www.brooksandrus.com/blog/2006/08/10/"> a Java program
</a> that does the same thing. It is 115 lines of code. Java programs
always seem to be bloated like this.
</p>
<p>
I hope you find this useful!
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Fantasy Name Generator: Request for Patterns</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/01/04/"/>
    <id>urn:uuid:a5777be4-102b-3359-a083-36a8a9578ae6</id>
    <updated>2009-01-04T00:00:00Z</updated>
    <category term="game"/><category term="perl"/>
    <content type="html">
      <![CDATA[<!-- 4 January 2009 -->
<p>
  <img src="/img/misc/name-generation.jpg" alt="" class="right"/>

Whether choosing a name for my character in a fantasy game or
populating a world which I pretend to myself that I will one day DM, I
have always gone to the <a href="http://www.rinkworks.com/namegen/">
RinkWorks Fantasy Name Generator</a>. The author of this tool, Samuel
Stoddard, gives <a
href="http://www.rinkworks.com/namegen/history.shtml"> some
history</a> on how he came to design and develop it.
</p>
<p>
It works by using pattern to select sets of letters to put together
into a name. There is a thorough, <a
href="http://www.rinkworks.com/namegen/instr.shtml">long
description</a> on the website. Unfortunately, he didn't share his
source code and so we can see how he did it.
</p>
<p>
Therefore, I used his description to duplicate his generator.
</p>
<p>
You can grab a copy here with <a href="http://git.or.cz/">git</a>,
</p>
<pre>
git clone <a href="https://github.com/skeeto/fantasyname">git://github.com/skeeto/fantasyname.git</a>
</pre>
<p>
It includes a command line interface as well as a web interface, which
I am running and linked to at the beginning of this post for you to
use. The code is available under the same license as Perl itself.
</p>
<p>
I used Perl and the <a
href="http://search.cpan.org/dist/Parse-RecDescent/lib/Parse/RecDescent.pm">
Parse::RecDescent</a> parser generator. Thanks to this module, it
essentially comes down to about 40 lines of code. The name pattern is
executed, just like a computer program, to generate a name. Here is
the BNF grammar I came up with,
</p>
<pre>
LITERAL ::= /[^|()&lt;&gt;]+/

TEMPLATE ::= /[-svVcBCimMDd']/

literal_set ::= LITERAL | group

template_set ::= TEMPLATE | group

literal_exp ::= literal_set literal_exp | literal_set

template_exp ::= template_set template_exp | template_set

literal_list ::= literal_exp "|" literal_list | literal_exp "|" | literal_exp

template_list ::= template_exp "|" template_list | template_exp "|" | template_exp

group ::= "&lt;" template_list "&gt;" | "(" literal_list ")"

name ::= template_list | group
</pre>
<p>
The program is just that, decorated with some bits of Perl. Since I
came up with it, I have found that it is slightly different than
Mr. Stoddard's generator, in that his allows empty sets anywhere.
Mine only allows them at the end of lists. For example, this is valid
for his generator,
</p>
<pre>
&lt;|B|C&gt;(ikk)
</pre>
<p>
But to work in mine, the empty item must be moved to the end,
</p>
<pre>
&lt;B|C|&gt;(ikk)
</pre>
<p>
This can be adjusted my making the proper changes to the grammar,
which I haven't figured out yet.
</p>
<p>
Another problem with mine is that Parse::RecDescent is
<i>slooooowwwww</i>. Ridiculously slow. Maybe I designed the grammar
poorly? This is probably the biggest problem. Even simple patterns can
take several seconds to generate names, specifically with deeply
nested patterns. For example, this can take minutes,
</p>
<pre>
&lt;&lt;&lt;&lt;&lt;&lt;&lt;s&gt;&gt;&gt;&gt;&gt;&gt;&gt;
</pre>
<p>
Before you go thinking you are going to tank my server, I have written
the web interface so that it limits the running time of the
generator. If you want to do something fancy, use your own hardware. ;-)
</p>
<p>
There is also a problem that it will silently drop invalid pattern
characters at the end of the pattern. This has to do with me not quite
understanding how to apply Parse::RecDescent yet.
</p>
<p>
And this is where I need your help. I have had some trouble coming up
with good patterns. I don't even have a good default, generic fantasy
name pattern. Here are some of mine,
</p>
<pre>
&lt;s|B|Bv|v&gt;&lt;V|s|'|V&gt;&lt;s|V|C&gt; # default
&lt;i|Cd&gt;D&lt;d|i&gt; # idiot
&lt;V|B&gt;&lt;V|vs|Vs&gt; # short
</pre>
<p>
None of which I am very satisfied.
</p>
<p>
You can design patterns for Nordic names, Gallic names,
Tolkienesque Middle Earth names, orc names, idiot names, dragon names,
dwarf names, elf names, Wheel of Time names, and so on. There is so
much potential available with this tool.
</p>
<p>
To suggest one to me, e-mail me some patterns, or even better, clone
my git repository and add one to it yourself (then ask me to pull from
you). This way your credit will stay directly attached to it with a
commit.
</p>
<p>
Good luck!
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Don't Write Your Own E-mail Validator</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2008/12/24/"/>
    <id>urn:uuid:0e9536bd-6f02-332b-e974-442f31779080</id>
    <updated>2008-12-24T00:00:00Z</updated>
    <category term="rant"/><category term="perl"/>
    <content type="html">
      <![CDATA[<!-- 24 December 2008 -->
<p>
Gmail has a nice feature: when delivering e-mail, everything including
and after a <code>+</code> in a Gmail address is ignored. For example,
mail arriving at all of these addresses would go to the same place if
they were Gmail addresses,
</p>
<pre>
account@example.com
account+nullprogram@example.com
account+slashdot@example.com
</pre>
<p>
Thanks to this feature, when a user acquires a Gmail account, Google
is actually providing about a googol (as in the number
10<sup>100</sup>) different e-mail addresses to that user! Quite
appropriate, really.
</p>
<p>
I have seen other mailers do similar things, like ignoring everything
after dashes. A nice advantage to this is when registering at a new
website I can customize my e-mail address for them by, say, throwing
the website name in it. Because I have a google of e-mail addresses
available, it is impossible to run out, so I can give every person I
meet their own version of my address. The custom address can come in
handy for sorting and filtering, and it will also tell me who is
selling out my e-mail address. This, of course, assumes that someone
isn't stripping out the extra text in my address to counter the Gmail
feature.
</p>
<p>
However, in my personal experience, most websites will not permit
<code>+</code>'s in addresses. This is completely ridiculous, because
it means that <b>virtually every website will incorrectly invalidate
perfectly valid e-mail addresses</b>. Even major websites, like
<i>coca-cola.com</i>, screw this up. They see the <code>+</code> in
the address and give up.
</p>
<p>
In fact, if I do a Google search for "email validation regex" right
now, 9 of the first 10 results return websites with regular
expressions that are complete garbage and will toss out many common,
valid addresses. The only useful result was at the fifth spot (linked
below).
</p>
<p>
For the love of Stallman's beard, <b>stop writing your own e-mail
address validators!</b>
</p>
<p>
Why shouldn't you even bother writing your own? Because the proper
Perl regular expression for <a
href="http://www.ietf.org/rfc/rfc0822.txt?number=822">RFC822</a> is <a
href="http://ex-parrot.com/~pdw/Mail-RFC822-Address.html">over 6
kilobytes in length</a>! Follow that link and look at that. This is
the <i>smallest</i> regular expression you would need to get it right.
</p>
<p>
If you <i>really</i> insist on having a nice short one and don't want
to use a validation library, which, again, is a stupid idea and you
<i>should</i> be using a library, then use the dumbest, most liberal
expression you can. (Just don't forget the security issues.) Like
this,
</p>
<pre>
.+@.+
</pre>
<p>
Seriously, if you add anything else you most almost surely make it
incorrectly reject valid addresses. Note that e-mail addresses can
contain spaces, and even more than one <code>@</code>! These are
valid addresses,
</p>
<pre>
"John Doe"@example.com
"@TheBeach"@example.com
</pre>
<p>
I have not yet found a website that will accept either of these, even
though both are completely valid addresses. Even MS Outlook, which I
use at work (allowing me to verify this), will refuse to send e-mail
to these addresses (Gmail accepts it just fine). Hmmm... maybe having
an address like these is a good anti-spam measure!
</p>
<p>
So if your e-mail address is <code>"John Doe"@example.com</code> no
one using Outlook can send you e-mail, which sounds like a feature to
me, really.
</p>
<p>
So, everyone, please stop writing e-mail validation regular
expressions. The work has been done, and you will only get it wrong,
guaranteed.
</p>
<p>
This is a similar rant I came across while writing mine: <a
href="http://www.santosj.name/general/stop-doing-email-validation-the-wrong-way/">
Stop Doing Email Validation the Wrong Way</a>.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Memoization</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2008/03/25/"/>
    <id>urn:uuid:c9f47622-9893-377c-a28b-d1cdfe8850e8</id>
    <updated>2008-03-25T00:00:00Z</updated>
    <category term="perl"/>
    <content type="html">
      <![CDATA[<!-- 25 March 2008 -->
<p>
I had written
in <a href="/blog/2008/01/29#collatz">a
previous post</a> about a neat feature of Lua. I found out later that
this is simply a form
of <a href="http://en.wikipedia.org/wiki/Memoization">Memoization</a>. The
idea is that you trade memory for speed by only doing calculations
once and keeping track of previously calculated values. I had even
complained about Perl hashes not being flexible enough (not true,
thanks
to <a href="http://perldoc.perl.org/Tie/Hash.html">Tie::Hash</a>). Perl
actually has something even cooler, which is the
<a href="http://search.cpan.org/~mjd/Memoize-1.01/Memoize.pm">Memoize
module</a>.
</p>
<p>
The module can memoize any function, although it is only useful on
"pure" functions: functions with no side effects and not dependant
external data that will change. The official documentation contains a
nice example demonstrating a recursive implementation of a Fibonacci
sequence generator. My example is a little program I wrote the other
day where the memoize module came in handy.
</p>
<p>
You have coins valued at 1, 2, 5, 10, 20, 50, 100, and 200. How many
different ways can 200 be made using any number of coins. A simple
recursive solution is this: stick in each coin one at a time and ask
the same question again. So, we use a coin worth 1, now the question
is how many ways can we make change for 199. Then 198, then 195, then
190, etc. Because the order of the coins is not important these two
sets are identical: (1 1 5) (5 1 1). So, to avoid counting the same
set twice, we also want to tell the function the largest size coin to
use from then on. Our function may look like this now (Perl),
</p>
<pre>
use List::Util qw(sum);

sub count {
    my ($s, $m) = @_;
    return 1 if ($s == 0);

    my @valid = grep {$_ &lt;= $s and $_ &gt;= $m} @coins;
    return 0 if ($#valid == -1);

    return sum map {count($s - $_, $_)} @valid;
}
</pre>
<p>
Where it is called as <code>count(total, max_coin_value)</code>.
</p>
<p>
However, we will be calculating the same value twice many times
over. For example, lets say we start filling the first 10 of 200 like
this: (1 1 1 1 1 5) or (5 5). The next call to <code>count</code> will
be <code>count(190, 5)</code> for both cases. Just like the recursive
Fibonacci implementation, we are spending an enormous amount of time
repeating ourselves. Running this for a value of 200 will take
minutes. Running it for a value of 2000 may take days! Memoization to
the rescue!
</p>
<p>
We will now add this,
</p>
<pre>
use Memoize;
memoize('count');
</pre>
<p>
The module has now transparently installed a new version of the
function over our original. If we ever pass the same arguments that we
already have passed, the module will look up the original calculated
value and return it instead of calling the real function. It now can
calculate the number of ways to make change for a value of 2000 in a
couple seconds rather than days. That's how much redundant work the
function was doing. Here is the whole thing,
</p>
<pre>
#!/usr/bin/perl

use strict;
use warnings;
no warnings qw(recursion);
use List::Util qw(sum);

use Memoize;
memoize('count');

my @coins = (1, 2, 5, 10, 20, 50, 100, 200);

print count(200, 1);
print "\n";

sub count {
    my ($s, $m) = @_;
    return 1 if ($s == 0);

    my @valid = grep {$_ &lt;= $s and $_ &gt;= $m} @coins;
    return 0 if ($#valid == -1);

    return sum map {count($s - $_, $_)} @valid;
}
</pre>
<p>
Now, to apply it to the Collatz problem from my previous post we get a
nice simple little program,
</p>
<pre>
#!/usr/bin/perl

use strict;
use warnings;
no warnings qw(recursion);
use List::Util qw(max);

use Memoize;
memoize('collatz');

while (&lt;&gt;) {
    my ($n, $m) = split;
    printf("$n $m %d\n", max map { collatz($_) } ($n..$m));
}

sub collatz {
    my $n = shift;
    return 1 if ($n == 1);
    return 1 + collatz(3*$n+1) if ($n &amp; 1);
    return 1 + collatz($n/2);
}
</pre>
<p>
I really do love Perl.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Faster Montage</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2007/12/26/"/>
    <id>urn:uuid:a4b9aedd-14f9-30fa-f2e5-7a478aff9be2</id>
    <updated>2007-12-26T00:00:00Z</updated>
    <category term="perl"/>
    <content type="html">
      <![CDATA[<p><em>Update May 2015: Somehow the original script was lost while <a href="/blog/2011/08/05/">changing
 hosts</a> four years ago. I’ve replaced the script with a smaller,
 better, standalone C program. Note: it has a different interface, so
 read the header first!</em></p>

<ul>
  <li><a href="/download/fastmontage.c">/download/fastmontage.c</a> (new!)</li>
</ul>

<p>I had written a previous post called <a href="/blog/2007/12/11">Movie DNA</a> where I
described a simple way of distilling an entire movie down to a single
frame. It involved the use of two tools, with no intermediate code or
software in the middle to glue things together.</p>

<p>The first tool, mplayer was used to dump all of the frames we needed.
This took about the running length of the movie to do, which wasn’t so
bad. There may be a way to speed this up by giving mplayer some extra
hints. I have not yet figured this part out.</p>

<p>The real time cost was in ImageMagick’s <code class="language-plaintext highlighter-rouge">montage</code> tool, which made the
final montage out of the images. This took between 6 and 10 hours to
do this, depending on the length of the movie. The process seemed to
be non-linear for some reason, with long movies taking
unproportionally longer to process (One could always dig around the
montage source to find out why). I knew there had to be a way that
this could be improved!</p>

<p>Well, I wrote a Perl script last night, dubbed <code class="language-plaintext highlighter-rouge">gdmontage</code> to speed up
the montage process. It was even faster than I thought it would be,
<strong>taking only 12 seconds</strong> on the same machine as before. It uses the
<a href="http://www.boutell.com/gd/">GD Graphics Library</a> via Perl’s <a href="http://search.cpan.org/dist/GD/GD.pm">GD module</a>, which you
would need to install to use this. It also uses the
<a href="http://search.cpan.org/~fluffy/Term-ProgressBar-2.09/lib/Term/ProgressBar.pm">Term::ProgressBar</a>, if it’s available, to provide a progress
bar and ETA.</p>

<p>Like the original <code class="language-plaintext highlighter-rouge">montage</code> program, the script recognizes file globs,
so you can provide the files through a glob in order to avoid the
limits on command line arguments.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./gdmontage.pl "frames/*"
</code></pre></div></div>

<p>It is a bit unfair to call my code a “faster montage” because it only
covers a tiny subset of the original <code class="language-plaintext highlighter-rouge">montage</code>. It makes some big
assumptions in order to be faster; specifically, it assumes that every
image is the same size. The original montage must look at every image
before it even starts in order to determine the dimensions and
placement of the final image.</p>

<p>It is also geared towards the Cinema Redux thing, doing only 60 images
per row. This can be changed internally (no command line arguments for
this) by adjusting the parameters at the top of the script. The script
could probably be easily expanded to include most of the features of
ImageMagick’s montage, but I am sure this Perl script would be much
faster when it comes to creating large montage’s. (Why is <code class="language-plaintext highlighter-rouge">montage</code> so
slow?)</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
