<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged elfeed at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/elfeed/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/elfeed/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:7a1e551e-16c1-4dc3-b77b-861a81651c99</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Debugging Emacs or: How I Learned to Stop Worrying and Love DTrace</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/01/17/"/>
    <id>urn:uuid:a55cabc9-2d87-30a4-9066-9ec5e45b8bce</id>
    <updated>2018-01-17T23:59:49Z</updated>
    <category term="emacs"/><category term="elfeed"/><category term="bsd"/>
    <content type="html">
      <![CDATA[<p><em>Update: This article was featured on <a href="https://www.youtube.com/watch?v=Xi_pX2QIzho">BSD Now 233</a> (starting
at 21:38).</em></p>

<p>For some time <a href="https://github.com/skeeto/elfeed">Elfeed</a> was experiencing a strange, spurious
failure. Every so often users were <a href="https://github.com/skeeto/elfeed/issues/248">seeing an error</a> (spoiler
warning) when updating feeds: “error in process sentinel: Search
failed.” If you use Elfeed, you might have even seen this yourself.
From the surface it appeared that curl, tasked with the
<a href="/blog/2016/06/16/">responsibility for downloading feed data</a>, was producing
incomplete output despite reporting a successful run. Since the run
was successful, Elfeed assumed certain data was in curl’s output
buffer, but, since it wasn’t, it failed hard.</p>

<!--more-->

<p>Unfortunately this issue was not reproducible. Manually running curl
outside of Emacs never revealed any issues. Asking Elfeed to retry
fetching the feeds would work fine. The issue would only randomly rear
its head when Elfeed was fetching many feeds in parallel, under
stress. By the time the error was discovered, the curl process had
exited and vital debugging information was lost. Considering that
this was likely to be a bug in Emacs itself, there really wasn’t a
reliable way to capture the necessary debugging information from
within Emacs Lisp. And, indeed, this later proved to be the case.</p>

<p>A quick-and-dirty work around is to use <code class="language-plaintext highlighter-rouge">condition-case</code> to catch and
swallow the error. When the bizarre issue shows up, rather than fail
badly in front of the user, Elfeed could attempt to swallow the error
— assuming it can be reliably detected — and treat the fetch as simply
a failure. That didn’t sit comfortably with me. Elfeed had done its
due diligence checking for errors already. <em>Someone</em> was lying to
Elfeed, and I intended to catch them with their pants on fire.
Someday.</p>

<p>I’d just need to witness the bug on one of my own machines. Elfeed is
part of my daily routine, so surely I’d have to experience this issue
myself someday. My plan was, should that day come, to run a modified
Elfeed, instrumented to capture extra data. I would have also routinely
run Emacs under GDB so that I could inspect the failure more deeply.</p>

<p>For now I just had to wait to <a href="https://www.youtube.com/watch?v=fE2KDzZaxvE">hunt that zebra</a>.</p>

<h3 id="bryan-cantrill-dtrace-and-freebsd">Bryan Cantrill, DTrace, and FreeBSD</h3>

<p>Over the holidays I re-discovered <a href="https://en.wikipedia.org/wiki/Bryan_Cantrill">Bryan Cantrill</a>, a systems
software engineer who worked for Sun between 1996 and 2010, and is most
well known for <a href="http://dtrace.org/blogs/about/">DTrace</a>. My first exposure to him was in a <a href="https://www.youtube.com/watch?v=l6XQUciI-Sc">BSD
Now interview</a> in 2015. I had re-watched that interview and decided
there was a lot more I had to learn from him. He’s become a personal
hero to me. So I scoured the internet for <a href="http://dtrace.org/blogs/bmc/2018/02/03/talks/">more of his writing and
talks</a>. Besides what I’ve already linked in this article, here
are a couple more great presentations:</p>

<ul>
  <li><a href="https://www.youtube.com/watch?v=4PaWFYm0kEw">Oral Tradition in Software Engineering</a></li>
  <li><a href="https://www.youtube.com/watch?v=-zRN7XLCRhc">Fork Yeah! The Rise and Development of illumos</a></li>
</ul>

<p>You can also find some of his writing <a href="http://dtrace.org/blogs/bmc/">scattered around the DTrace
blog</a>.</p>

<p>Some interesting operating system technology came out of Sun during
its final 15 or so years — most notably DTrace and ZFS — and Bryan
speaks about it passionately. Almost as a matter of luck, most of it
survived the Oracle acquisition thanks to Sun releasing it as open
source in just the nick of time. Otherwise it would have been lost
forever. The scattered ex-Sun employees, still passionate about their
prior work at Sun, along with some of their old customers have since
picked up the pieces and kept going as a community under the name
<a href="https://illumos.org/">illumos</a>. It’s like an open source flotilla.</p>

<p>Naturally I wanted to get my hands on this stuff to try it out for
myself. Is it really as good as they say? Normally I stick to Linux,
but it (generally) doesn’t have these Sun technologies. The main
reason is license incompatibility. Sun released its code under the
<a href="https://opensource.org/licenses/CDDL-1.0">CDDL</a>, which is incompatible with the GPL. Ubuntu <em>does</em>
<a href="https://insights.ubuntu.com/2016/02/18/zfs-licensing-and-linux/">infamously include ZFS</a>, but other distributions are
unwilling to take that risk. Porting DTrace is a serious undertaking
since it’s got its fingers throughout the kernel, which also makes the
licensing issues even more complicated.</p>

<p>(<em>Update Feburary 2018</em>: <a href="https://gnu.wildebeest.org/blog/mjw/2018/02/14/dtrace-for-linux-oracle-does-the-right-thing/">DTrace has been released under the
GPLv2</a>, allowing it to be legally integrated with Linux.)</p>

<p>Linux has a reputation for Not Invented Here (NIH) syndrome, and these
licensing issues certainly contribute to that. Rather than adopt ZFS
and DTrace, they’ve been reinvented from scratch: btrfs instead of
ZFS, and <a href="http://www.brendangregg.com/blog/2015-07-08/choosing-a-linux-tracer.html">a slew of partial options</a> instead of DTrace.
Normally I’m most interested in system call tracing, and my go to is
<a href="https://en.wikipedia.org/wiki/Strace">strace</a>, though it certainly has its limitations — including
this situation of debugging curl under Emacs. Another famous example
of NIH is Linux’s <a href="http://man7.org/linux/man-pages/man7/epoll.7.html"><code class="language-plaintext highlighter-rouge">epoll(2)</code></a>, which is a <a href="https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/">broken</a>
<a href="https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/">version</a> of BSD <a href="https://www.freebsd.org/cgi/man.cgi?query=kqueue&amp;sektion=2"><code class="language-plaintext highlighter-rouge">kqueue(2)</code></a>.</p>

<p>So, if I want to try these for myself, I’ll need to install a
different operating system. I’ve dabbled with <a href="https://omnios.omniti.com/">OmniOS</a>, an OS
built on illumos, in virtual machines, using it as an alien
environment to test some of my software (e.g. <a href="/blog/2017/03/12/">enchive</a>).
OmniOS has a philosophy called <a href="https://omnios.omniti.com/wiki.php/KYSTY">Keep Your Software To Yourself</a>
(KYSTY), which is really just code for “we don’t do packaging.”
Honestly, you can’t blame them since <a href="https://utcc.utoronto.ca/~cks/space/blog/solaris/IllumosSupportLimits">they’re a tiny community</a>.
The best solution to this is probably <a href="https://www.pkgsrc.org/">pkgsrc</a>, which is
essentially a universal packaging system. Otherwise <a href="/blog/2017/06/19/">you’re on your
own</a>.</p>

<p>There’s also <a href="https://www.openindiana.org/">openindiana</a>, which is a more friendly
desktop-oriented illumos distribution. Still, the short of it is that
you’re very much on your own when things don’t work. The situation is
like running Linux a couple decades ago, when it was still difficult
to do.</p>

<p>If you’re interested in trying DTrace, the easiest option these days is
probably <a href="https://www.freebsd.org/">FreeBSD</a>. It’s got a big, active community, thorough
documentation, and a huge selection of packages. Its license (the <em>BSD
license</em>, duh) is compatible with the CDDL, so both ZFS and DTrace have
been ported to FreeBSD.</p>

<h3 id="what-is-dtrace">What is DTrace?</h3>

<p>I’ve done all this talking but haven’t yet described what <a href="https://wiki.freebsd.org/DTrace/Tutorial">DTrace
really is</a>. I won’t pretend to write my own tutorial, but I’ll
provide enough information to follow along. DTrace is a tracing
framework for debugging production systems <em>in real time</em>, both for
the kernel and for applications. The “production systems” part means
it’s stable and safe — using DTrace won’t put your system at risk of
crashing or damaging data. The “real time” part means it has little
impact on performance. You can use DTrace on live, active systems with
little impact. Both of these core design principles are vital for
troubleshooting those really tricky bugs that only show up in
production.</p>

<p>There are DTrace <em>probes</em> scattered all throughout the system: on
system calls, scheduler events, networking events, process events,
signals, virtual memory events, etc. Using a specialized language
called D (unrelated to the general purpose programming language D),
you can dynamically add behavior at these instrumentation points.
Generally the behavior is to capture information, but it can also
manipulate the event being traced.</p>

<p>Each probe is fully identified by a 4-tuple delimited by colons:
provider, module, function, and probe name. An empty element denotes a
sort of wildcard. For example, <code class="language-plaintext highlighter-rouge">syscall::open:entry</code> is a probe at the
beginning (i.e. “entry”) of <code class="language-plaintext highlighter-rouge">open(2)</code>. <code class="language-plaintext highlighter-rouge">syscall:::entry</code> matches all
system call entry probes.</p>

<p>Unlike strace on Linux which monitors a specific process, DTrace
applies to the entire system when active. To run curl under strace
from Emacs, I’d have to modify Emacs’ behavior to do so. With DTrace I
can instrument every curl process without making a single change to
Emacs, and with negligible impact to Emacs. That’s a big deal.</p>

<p>So, when it comes to this Elfeed issue, FreeBSD is much better poised
for debugging the problem. All I have to do is catch it in the act.
However, it’s been months since that bug report and I’m not really
making this connection yet. I’m just hoping I eventually find an
interesting problem where I can apply DTrace.</p>

<h3 id="freebsd-on-a-raspberry-pi-2">FreeBSD on a Raspberry Pi 2</h3>

<p>So I’ve settled in FreeBSD as the playground for these technologies, I
just have to decide where. I could always run it in a virtual machine,
but it’s always more interesting to try things out on real hardware.
<a href="https://wiki.freebsd.org/FreeBSD/arm/Raspberry%20Pi">FreeBSD supports the Raspberry Pi 2</a> as a Tier 2 system, and
I had a Raspberry Pi 2 sitting around collecting dust, so I put it to
use.</p>

<p>I wrote the image to an SD card, and for a few days I stretched my
legs on this new system. I cloned a couple dozen of my own git
repositories, ran the builds and the tests, and just got a feel for
things. I tried out the ports system for the first time, mainly to
discover that the low-powered Raspberry Pi 2 takes days to build some
of the packages I want to try.</p>

<p>I <a href="/blog/2017/04/01/">mostly program in Vim these days</a>, so it’s some days before I
even set up Emacs. Eventually I do build Emacs, clone my
configuration, fire it up, and give Elfeed a spin.</p>

<p>And that’s when the “search failed” bug strikes! Not just once, but
dozens of times. Perfect! This low-powered platform is the jackpot for
this particular bug, triggering it left and right. Given that I’ve got
DTrace at my disposal, it’s <em>the</em> perfect place to debug this.
Something is lying to Elfeed and DTrace will play the judge.</p>

<p>Before I dive in I see three possibilities:</p>

<ol>
  <li>curl is reporting success but truncating its output.</li>
  <li>Emacs is quietly truncating curl’s output.</li>
  <li>Emacs is misinterpreting curl’s exit status.</li>
</ol>

<p>With Dtrace I can observe what every curl process writes to Emacs, and
I can also double check curl’s exit status. I come up with the
following (newbie) DTrace script:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>syscall::write:entry
/execname == "curl"/
{
    printf("%d WRITE %d \"%s\"\n",
           pid, arg2, stringof(copyin(arg1, arg2)));
}

syscall::exit:entry
/execname == "curl"/
{
    printf("%d EXIT  %d\n", pid, arg0);
}
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">/execname == "curl"/</code> is a predicate that (obviously) causes the
behavior to only fire for curl processes. The first probe has DTrace
print a line for every <code class="language-plaintext highlighter-rouge">write(2)</code> from curl. <code class="language-plaintext highlighter-rouge">arg0</code>, <code class="language-plaintext highlighter-rouge">arg1</code>, and
<code class="language-plaintext highlighter-rouge">arg2</code> correspond to the arguments of <code class="language-plaintext highlighter-rouge">write(2)</code>: fd, buf, count. It
logs the process ID (pid) of the write, the length of the write, and
the actual contents written. Remember that these curl processes are
run in parallel by Emacs, so the pid allows me to associate the
separate writes and the exit status.</p>

<p>The second probe prints the pid and the exit status (the first argument
to <code class="language-plaintext highlighter-rouge">exit(2)</code>).</p>

<p>I also want to compare this to exactly what is delivered to Elfeed when
curl exits, so I modify the <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Sentinels.html">process sentinel</a> — the callback
that handles a subprocess exiting — to call <code class="language-plaintext highlighter-rouge">write-file</code> before any
action is taken. I can compare these buffer dumps to the logs produced
by DTrace.</p>

<p>There are two important findings.</p>

<p>First, when the “search failed” bug occurs, the buffer was completely
empty (95% of the time) or truncated at the end of the HTTP headers
(5% of the time), right at the blank line. DTrace indicates that curl
did its job to the full, so it’s Emacs who’s the liar. It’s not
delivering all of curl’s data to Elfeed. That’s pretty annoying.</p>

<p>Second, <strong>curl was line-buffered</strong>. Each line was a separate,
independent <code class="language-plaintext highlighter-rouge">write(2)</code>. I was certainly <em>not</em> expecting this. Normally
the C library only does line buffering when the output is a terminal.
That’s because it’s guessing a user may be watching, expecting the
output to arrive a line at a time.</p>

<p>Here’s a sample of what it looked like in the log:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>88188 WRITE 32 "Server: Apache/2.4.18 (Ubuntu)
"
88188 WRITE 46 "Location: https://blog.plover.com/index.atom
"
88188 WRITE 21 "Content-Length: 299
"
88188 WRITE 45 "Content-Type: text/html; charset=iso-8859-1
"
88188 WRITE 2 "
"
</code></pre></div></div>

<p>Why would curl think Emacs is a terminal?</p>

<p><em>Oh.</em> That’s right. <em>This is the <a href="/blog/2014/02/06/">same problem I ran into four years
ago when writing EmacSQL</a>.</em> By default Emacs connects to
subprocesses through a pseudo-terminal (pty). I called this a mistake
in Emacs back then, and I still stand by that claim. The pty causes
weird, annoying problems for little benefit:</p>

<ul>
  <li>Interpreting control characters. Hope you weren’t transferring binary
data!</li>
  <li>Subprocesses will generally get line buffered. This makes them
slower, though in some situations it might be desirable.</li>
  <li>Stdout and stderr get mixed together. (Optional since Emacs 25.)</li>
  <li><em>New!</em> There’s a bug somewhere in Emacs that causes truncation when
ptys are used heavily in parallel.</li>
</ul>

<p>Just from eyeballing the DTrace log I knew what to do: dump the pty
and switch to a pipe. This is controlled with the
<code class="language-plaintext highlighter-rouge">process-connection-type</code> variable, and fixing it <a href="https://github.com/skeeto/elfeed/commit/945765a57d2f27996b6a43bc62e803dc167d1547">is a
one-liner</a>.</p>

<p>Not only did this completely resolve the truncation issue, Elfeed is
noticeably faster at fetching feeds on all machines. It’s no longer
receiving mountains of XML one line at a time, like sucking pudding
through a straw. It’s now quite zippy even on my Raspberry Pi 2, which
had <em>never</em> been the case before (without the “search failed” bug).
Even if you were never affected by this bug, you will benefit from the
fix.</p>

<p>I haven’t officially reported this as an Emacs bug yet because
reproducibility is still an issue. It needs something better than
“fire off a bunch of HTTP requests across the internet in parallel
from a Raspberry Pi.”</p>

<p>The fix reminds me of that <a href="https://www.buzzmaven.com/old-engineer-hammer-2/">old boilermaker story</a> about
charging a lot of money just to swing a hammer. Once the problem
arose, <strong>DTrace quickly helped to identify the place to hit Emacs with
the hammer</strong>.</p>

<p><em>Finally, a big thanks to alphapapa for originally taking the time to
report this bug months ago.</em></p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Domain-Specific Language Compilation in Elfeed</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/27/"/>
    <id>urn:uuid:6a6cd6a2-b44d-35b5-503c-c496d9094ac0</id>
    <updated>2016-12-27T21:46:30Z</updated>
    <category term="elfeed"/><category term="emacs"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>Last night I pushed another performance enhancement for Elfeed, this
time reducing the time spent parsing feeds. It’s accomplished by
compiling, during macro expansion, a jQuery-like domain-specific
language within Elfeed.</p>

<h3 id="heuristic-parsing">Heuristic parsing</h3>

<p>Given the nature of the domain — <a href="/blog/2013/09/23/">an under-specified standard</a>
and a lack of robust adherence — feed parsing is much more heuristic
than strict. Sure, everyone’s feed XML is strictly conforming since
virtually no feed reader tolerates invalid XML (thank you, XML
libraries), but, for the schema, the situation resembles the <em>de
facto</em> looseness of HTML. Sometimes important or required information
is missing, or is only available in <a href="https://www.intertwingly.net/wiki/pie/DublinCore">a different namespace</a>.
Sometimes, especially in the case of timestamps, it’s in the wrong
format, or encoded incorrectly, or ambiguous. It’s real world data.</p>

<p>To get a particular piece of information, Elfeed looks in a number of
different places within the feed, starting with the preferred source
and stopping when the information is found. For example, to find the
date of an Atom entry, Elfeed first searches for elements in this
order:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">&lt;published&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;updated&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;date&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;modified&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;issued&gt;</code></li>
</ol>

<p>Failing to find any of these elements, or if no parsable date is
found, it settles on the current time. Only the <code class="language-plaintext highlighter-rouge">updated</code> element is
required, but <code class="language-plaintext highlighter-rouge">published</code> usually has the desired information, so it
goes first. The last three are only valid for another namespace, but
are useful fallbacks.</p>

<p>Before Elfeed even starts this search, the XML text is parsed into an
s-expression using <code class="language-plaintext highlighter-rouge">xml-parse-region</code> — a pure Elisp XML parser
included in Emacs. The search is made over the resulting s-expression.</p>

<p>For example, here’s a sample <a href="https://tools.ietf.org/html/rfc4287">from the Atom specification</a>.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;?xml version="1.0" encoding="utf-8"?&gt;</span>
<span class="nt">&lt;feed</span> <span class="na">xmlns=</span><span class="s">"http://www.w3.org/2005/Atom"</span><span class="nt">&gt;</span>

  <span class="nt">&lt;title&gt;</span>Example Feed<span class="nt">&lt;/title&gt;</span>
  <span class="nt">&lt;link</span> <span class="na">href=</span><span class="s">"http://example.org/"</span><span class="nt">/&gt;</span>
  <span class="nt">&lt;updated&gt;</span>2003-12-13T18:30:02Z<span class="nt">&lt;/updated&gt;</span>
  <span class="nt">&lt;author&gt;</span>
    <span class="nt">&lt;name&gt;</span>John Doe<span class="nt">&lt;/name&gt;</span>
  <span class="nt">&lt;/author&gt;</span>
  <span class="nt">&lt;id&gt;</span>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6<span class="nt">&lt;/id&gt;</span>

  <span class="nt">&lt;entry&gt;</span>
    <span class="nt">&lt;title&gt;</span>Atom-Powered Robots Run Amok<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;link</span> <span class="na">rel=</span><span class="s">"alternate"</span> <span class="na">href=</span><span class="s">"http://example.org/2003/12/13/atom03"</span><span class="nt">/&gt;</span>
    <span class="nt">&lt;id&gt;</span>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a<span class="nt">&lt;/id&gt;</span>
    <span class="nt">&lt;updated&gt;</span>2003-12-13T18:30:02Z<span class="nt">&lt;/updated&gt;</span>
    <span class="nt">&lt;summary&gt;</span>Some text.<span class="nt">&lt;/summary&gt;</span>
  <span class="nt">&lt;/entry&gt;</span>

<span class="nt">&lt;/feed&gt;</span>
</code></pre></div></div>

<p>Which is parsed to into this s-expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">((</span><span class="nv">feed</span> <span class="p">((</span><span class="nv">xmlns</span> <span class="o">.</span> <span class="s">"http://www.w3.org/2005/Atom"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">title</span> <span class="p">()</span> <span class="s">"Example Feed"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">link</span> <span class="p">((</span><span class="nv">href</span> <span class="o">.</span> <span class="s">"http://example.org/"</span><span class="p">)))</span>
       <span class="p">(</span><span class="nv">updated</span> <span class="p">()</span> <span class="s">"2003-12-13T18:30:02Z"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">author</span> <span class="p">()</span> <span class="p">(</span><span class="nv">name</span> <span class="p">()</span> <span class="s">"John Doe"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">id</span> <span class="p">()</span> <span class="s">"urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">entry</span> <span class="p">()</span>
              <span class="p">(</span><span class="nv">title</span> <span class="p">()</span> <span class="s">"Atom-Powered Robots Run Amok"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">link</span> <span class="p">((</span><span class="nv">rel</span> <span class="o">.</span> <span class="s">"alternate"</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">href</span> <span class="o">.</span> <span class="s">"http://example.org/2003/12/13/atom03"</span><span class="p">)))</span>
              <span class="p">(</span><span class="nv">id</span> <span class="p">()</span> <span class="s">"urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">updated</span> <span class="p">()</span> <span class="s">"2003-12-13T18:30:02Z"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">summary</span> <span class="p">()</span> <span class="s">"Some text."</span><span class="p">))))</span>
</code></pre></div></div>

<p>Each XML element is converted to a list. The first item is a symbol
that is the element’s name. The second item is an alist of attributes
— cons pairs of symbols and strings. And the rest are its children,
both string nodes and other elements. I’ve trimmed the extraneous
string nodes from the sample s-expression.</p>

<p>A subtle detail is that <code class="language-plaintext highlighter-rouge">xml-parse-region</code> doesn’t just return the
root element. It returns a <em>list of elements</em>, which always happens to
be a single element list, which is the root element. I don’t know why
this is, but I’ve built everything to assume this structure as input.</p>

<p>Elfeed strips all namespaces stripped from both elements and
attributes to make parsing simpler. As I said, it’s heuristic rather
than strict, so namespaces are treated as noise.</p>

<h3 id="a-domain-specific-language">A domain-specific language</h3>

<p>Coding up Elfeed’s s-expression searches in straight Emacs Lisp would
be tedious, error-prone, and difficult to understand. It’s a lot of
loops, <code class="language-plaintext highlighter-rouge">assoc</code>, etc. So instead I invented a jQuery-like, CSS
selector-like, domain-specific language (DSL) to express these
searches concisely and clearly.</p>

<p>For example, all of the entry links are “selected” using this
expression:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">link</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span>
</code></pre></div></div>

<p>Reading right-to-left, this matches every <code class="language-plaintext highlighter-rouge">href</code> attribute under every
<code class="language-plaintext highlighter-rouge">link</code> element with the <code class="language-plaintext highlighter-rouge">rel="alternate"</code> attribute, under every
<code class="language-plaintext highlighter-rouge">entry</code> element, under the <code class="language-plaintext highlighter-rouge">feed</code> root element. Symbols match element
names, two-element vectors match elements with a particular attribute
pair, and keywords (which must come last) narrow the selection to a
specific attribute value.</p>

<p>Imagine hand-writing the code to navigate all these conditions for
each piece of information that Elfeed requires. The RSS parser makes
up to 16 such queries, and the Atom parser makes as many as 24. That
would add up to a lot of tedious code.</p>

<p>The package (included with Elfeed) that executes this query is called
“xml-query.” It comes in two flavors: <code class="language-plaintext highlighter-rouge">xml-query</code> and <code class="language-plaintext highlighter-rouge">xml-query-all</code>.
The former returns just the first match, and the latter returns all
matches. The naming parallels the <code class="language-plaintext highlighter-rouge">querySelector()</code> and
<code class="language-plaintext highlighter-rouge">querySelectorAll()</code> DOM methods in JavaScript.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">xml</span> <span class="p">(</span><span class="nv">elfeed-xml-parse-region</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">xml-query-all</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">link</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span> <span class="nv">xml</span><span class="p">))</span>

<span class="c1">;; =&gt; ("http://example.org/2003/12/13/atom03")</span>
</code></pre></div></div>

<p>That date search I mentioned before looks roughly like this. The <code class="language-plaintext highlighter-rouge">*</code>
matches text nodes within the selected element. It must come last just
like the keyword matcher.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">published</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">updated</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">date</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">modified</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">issued</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">current-time</span><span class="p">))</span>
</code></pre></div></div>

<p>Over the past three years, Elfeed has gained more and more of these
selectors as it collects more and more information from feeds. Most
recently, Elfeed collects author and category information provided by
feeds. Each new query slows feed parsing a little bit, and it’s a
perfect example of a program slowing down as it gains more features
and capabilities.</p>

<p>But I don’t want Elfeed to slow down. I want it to get <em>faster</em>!</p>

<h3 id="optimizing-the-domain-specific-language">Optimizing the domain-specific language</h3>

<p>Just like the primary jQuery function (<code class="language-plaintext highlighter-rouge">$</code>), both <code class="language-plaintext highlighter-rouge">xml-query</code> and
<code class="language-plaintext highlighter-rouge">xml-query-all</code> are functions. The xml-query engine processes the
selector from scratch on each invocation. It examines the first
element, dispatches on its type/value to apply it to the input, and
then recurses on the rest of selector with the narrowed input,
stopping when it hits the end of the list. That’s the way it’s worked
from the start.</p>

<p>However, every selector argument in Elfeed is a static, quoted list.
<a href="/blog/2016/12/11/">Unlike user-supplied filters</a>, I know exactly what I want to
execute ahead of time. It would be much better if the engine didn’t
have to waste time reparsing the DSL for each query.</p>

<p>This is the classic split between interpreters and compilers. An
interpreter reads input and immediately executes it, doing what the
input tells it to do. A compiler reads input and, rather than execute
it, produces output, usually in a simpler language, that, when
evaluated, has the same effect as executing the input.</p>

<p>Rather than interpret the selector, it would be better to compile it
into Elisp code, compile that <a href="/blog/2014/01/04/">into byte-code</a>, and then have the
Emacs byte-code virtual machine (VM) execute the query each time it’s
needed. The extra work of parsing the DSL is performed ahead of time,
the dispatch is entirely static, and the selector ultimately executes
on a much faster engine (byte-code VM). This should be a lot faster!</p>

<p>So I wrote a function that accepts a selector expression and emits
Elisp source that implements that selector: a compiler for my DSL.
Having a readily-available syntax tree is one of the <a href="https://en.wikipedia.org/wiki/Homoiconicity">big advantages
of homoiconicity</a>, and this sort of function makes perfect sense
in a lisp. For the external interface, this compiler function is
called by a new pair of macros, <code class="language-plaintext highlighter-rouge">xml-query*</code> and <code class="language-plaintext highlighter-rouge">xml-query-all*</code>.
These macros consume a static selector and expand into the compiled
Elisp form of the selector.</p>

<p>To demonstrate, remember that link query from before? Here’s the macro
version of that selection, but only returning the first match. Notice
the selector is no longer quoted. This is because it’s consumed by the
macro, not evaluated.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">xml-query*</span> <span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">title</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span> <span class="nv">xml</span><span class="p">)</span>
</code></pre></div></div>

<p>This will expand into the following code.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">catch</span> <span class="ss">'done</span>
  <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="nv">xml</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'feed</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cddr</span> <span class="nv">v</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'entry</span><span class="p">))</span>
          <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cddr</span> <span class="nv">v</span><span class="p">))</span>
            <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'title</span><span class="p">))</span>
              <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">value</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'rel</span> <span class="p">(</span><span class="nb">cadr</span> <span class="nv">v</span><span class="p">)))))</span>
                <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">equal</span> <span class="nv">value</span> <span class="s">"alternate"</span><span class="p">)</span>
                  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'href</span> <span class="p">(</span><span class="nb">cadr</span> <span class="nv">v</span><span class="p">)))))</span>
                    <span class="p">(</span><span class="nb">when</span> <span class="nv">v</span>
                      <span class="p">(</span><span class="k">throw</span> <span class="ss">'done</span> <span class="nv">v</span><span class="p">))))))))))))</span>
</code></pre></div></div>

<p>As soon as it finds a match, it’s thrown to the top level and
returned. Without the DSL, the expansion is essentially what would
have to be written by hand. <strong>This is <em>exactly</em> the sort of leverage
you should be getting from a compiler.</strong> It compiles to around 130
byte-code instructions.</p>

<p>The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> form is nearly the same, but instead of a
<code class="language-plaintext highlighter-rouge">throw</code>, it pushes the result into the return list. Only the prologue
(the outermost part) and the epilogue (the innermost part) are
different.</p>

<p>Parsing feeds is a hot spot for Elfeed, so I wanted the compiler’s
output to be as efficient as possible. I had three goals for this:</p>

<ul>
  <li>
    <p><strong>No extraneous code.</strong> It’s easy for the compiler to emit
unnecessary code. The byte-code compiler might be able to eliminate
some of it, but I don’t want to rely on that. Except for the
identifiers, it should basically look like a human wrote it.</p>
  </li>
  <li>
    <p><strong>Avoid function calls.</strong> I don’t want to pay function call
overhead, and, with some care, it’s easy to avoid. In the
<code class="language-plaintext highlighter-rouge">xml-query*</code> expansion, the only function call is <code class="language-plaintext highlighter-rouge">throw</code>, which is
unavoidable. The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> version makes no function calls
whatsoever. Notice that I used <code class="language-plaintext highlighter-rouge">assq</code> rather than <code class="language-plaintext highlighter-rouge">assoc</code>. First, it
only needs to match symbols, so it should be faster. Second, <code class="language-plaintext highlighter-rouge">assq</code>
has its own byte-code instruction (158) and <code class="language-plaintext highlighter-rouge">assoc</code> does not.</p>
  </li>
  <li>
    <p><strong>No unnecessary memory allocations</strong>. The <code class="language-plaintext highlighter-rouge">xml-query*</code> expansion
makes <em>no</em> allocations. The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> version only conses
once per output, which is the minimum possible.</p>
  </li>
</ul>

<p>The end result is at least as optimal as hand-written code, but
without the chance of human error (typos, fat fingering) and sourced
from an easy-to-read DSL.</p>

<h3 id="performance">Performance</h3>

<p>In my tests, the <strong>xml-query macros are a full order of magnitude
faster than the functions</strong>. Yes, ten times faster! It’s an even
bigger gain than I expected.</p>

<p>In the full picture, xml-query is only one part of parsing a feed.
Measuring the time starting from raw XML text (as <a href="/blog/2016/06/16/">delivered by
cURL</a>) to a list of database entry objects, I’m seeing an
<strong>overall 25% speedup</strong> with the macros. The remaining time is
dominated by <code class="language-plaintext highlighter-rouge">xml-parse-region</code>, which is mostly out of my control.</p>

<p>With xml-query so computationally cheap, I don’t need to worry about
using it more often. Compared to parsing XML text, it’s virtually
free.</p>

<p>When it came time to validate my DSL compiler, I was <em>really</em> happy
that Elfeed had a test suite. I essentially rewrote a core component
from scratch, and passing all of the unit tests was a strong sign that
it was correct. Many times that test suite has provided confidence in
changes made both by me and by others.</p>

<p>I’ll end by describing another possible application: Apply this
technique to regular expressions, such that static strings containing
regular expressions are compiled into Elisp/byte-code via macro
expansion. I wonder if situationally this would be faster than Emacs’
own regular expression engine.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Faster Elfeed Search Through JIT Byte-code Compilation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/11/"/>
    <id>urn:uuid:47002cc3-816a-3cb8-b462-327364e3f943</id>
    <updated>2016-12-11T23:16:42Z</updated>
    <category term="emacs"/><category term="elfeed"/><category term="optimization"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Today I pushed an update for <a href="https://github.com/skeeto/elfeed">Elfeed</a> that doubles the speed
of the search filter in the worse case. This is the user-entered
expression that dynamically narrows the entry listing to a subset that
meets certain criteria: published after a particular date,
with/without particular tags, and matching/non-matching zero or more
regular expressions. The filter is live, applied to the database as
the expression is edited, so it’s important for usability that this
search completes under a threshold that the user might notice.</p>

<p><img src="/img/elfeed/filter.gif" alt="" /></p>

<p>The typical workaround for these kinds of interfaces is to make
filtering/searching asynchronous. It’s possible to do this well, but
it’s usually a terrible, broken design. If the user acts upon the
asynchronous results — say, by typing the query and hitting enter to
choose the current or expected top result — then the final behavior is
non-deterministic, a race between the user’s typing speed and the
asynchronous search. Elfeed will keep its synchronous live search.</p>

<p>For anyone not familiar with Elfeed, here’s a filter that finds all
entries from within the past year tagged “youtube” (<code class="language-plaintext highlighter-rouge">+youtube</code>) that
mention Linux or Linus (<code class="language-plaintext highlighter-rouge">linu[sx]</code>), but aren’t tagged “bsd” (<code class="language-plaintext highlighter-rouge">-bsd</code>),
limited to the most recent 15 entries (<code class="language-plaintext highlighter-rouge">#15</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@1-year-old +youtube linu[xs] -bsd #15
</code></pre></div></div>

<p>The database is primarily indexed over publication date, so filters on
publication dates are the most efficient filters. Entries are visited
in order starting with the most recently published, and the search can
bail out early once it crosses the filter threshold. Time-oriented
filters have been encouraged as the solution to keep the live search
feeling lively.</p>

<h3 id="filtering-overview">Filtering Overview</h3>

<p>The first step in filtering is parsing the filter text entered by the
user. This string is broken into its components using the
<code class="language-plaintext highlighter-rouge">elfeed-search-parse-filter</code> function. Date filter components are
converted into a unix epoch interval, tags are interned into symbols,
regular expressions are gathered up as strings, and the entry limit is
parsed into a plain integer. Absence of a filter component is
indicated by nil.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">elfeed-search-parse-filter</span> <span class="s">"@1-year-old +youtube linu[xs] -bsd #15"</span><span class="p">)</span>
<span class="c1">;; =&gt; (31557600.0 (youtube) (bsd) ("linu[xs]") nil 15)</span>
</code></pre></div></div>

<p>Previously, the next step was to apply the <code class="language-plaintext highlighter-rouge">elfeed-search-filter</code>
function with this structured filter representation to the database.
Except for special early-bailout situations, it works left-to-right
across the filter, checking each condition against each entry. This is
analogous to an interpreter, with the filter being a program.</p>

<p>Thinking about it that way, what if the filter was instead compiled
into an Emacs byte-code function and executed directly by the Emacs
virtual machine? That’s what this latest update does.</p>

<h3 id="benchmarks">Benchmarks</h3>

<p>With six different filter components, the actual filtering routine is
a bit too complicated for an article, so I’ll set up a simpler, but
roughly equivalent, scenario. With a reasonable cut-off date, the
filter was already sufficiently fast, so for benchmarking I’ll focus
on the worst case: no early bailout opportunities. An entry will be
just a list of tags (symbols), and the filter will have to test every
entry.</p>

<p>My <a href="/blog/2016/08/12/">real-world Elfeed database</a> currently has 46,772 entries with
36 distinct tags. For my benchmark I’ll round this up to a nice
100,000 entries, and use 26 distinct tags (A–Z), which has the nice
alphabet property and more closely reflects the number of tags I still
care about.</p>

<p>First, here’s <code class="language-plaintext highlighter-rouge">make-random-entry</code> to generate a random list of 1–5
tags (i.e. an entry). The <code class="language-plaintext highlighter-rouge">state</code> parameter is the random state,
allowing for deterministic benchmarks on a randomly-generated
database.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">make-random-entry</span> <span class="p">(</span><span class="k">&amp;key</span> <span class="nv">state</span> <span class="p">(</span><span class="nb">min</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="nb">max</span> <span class="mi">5</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="p">(</span><span class="nb">+</span> <span class="nb">min</span> <span class="p">(</span><span class="nv">cl-random</span> <span class="p">(</span><span class="nb">1+</span> <span class="p">(</span><span class="nb">-</span> <span class="nb">max</span> <span class="nb">min</span><span class="p">))</span> <span class="nv">state</span><span class="p">))</span>
           <span class="nv">for</span> <span class="nv">letter</span> <span class="nb">=</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">?A</span> <span class="p">(</span><span class="nv">cl-random</span> <span class="mi">26</span> <span class="nv">state</span><span class="p">))</span>
           <span class="nv">collect</span> <span class="p">(</span><span class="nb">intern</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%c"</span> <span class="nv">letter</span><span class="p">))))</span>
</code></pre></div></div>

<p>The database is just a big list of entries. In Elfeed this is actually
an AVL tree. Without dates, the order doesn’t matter.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">make-random-database</span> <span class="p">(</span><span class="k">&amp;key</span> <span class="nv">state</span> <span class="p">(</span><span class="nb">count</span> <span class="mi">100000</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="nb">count</span> <span class="nv">collect</span> <span class="p">(</span><span class="nv">make-random-entry</span> <span class="ss">:state</span> <span class="nv">state</span><span class="p">)))</span>
</code></pre></div></div>

<p>Here’s <a href="/blog/2009/05/28/">my old time macro</a>. An important change I’ve made since
years ago is to call <code class="language-plaintext highlighter-rouge">garbage-collect</code> before starting the clock,
eliminating bad samples from unlucky garbage collection events.
Depending on what you want to measure, it may even be worth disabling
garbage collection during the measurement by setting
<code class="language-plaintext highlighter-rouge">gc-cons-threshold</code> to a high value.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">measure-time</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">start</span> <span class="p">(</span><span class="nb">make-symbol</span> <span class="s">"start"</span><span class="p">)))</span>
    <span class="o">`</span><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="o">,</span><span class="nv">start</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)))</span>
       <span class="o">,@</span><span class="nv">body</span>
       <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)</span> <span class="o">,</span><span class="nv">start</span><span class="p">))))</span>
</code></pre></div></div>

<p>Finally, the benchmark harness. It uses a hard-coded seed to generate
the same pseudo-random database. The test is run against the a filter
function, <code class="language-plaintext highlighter-rouge">f</code>, 100 times in search for the same 6 tags, and the timing
results are averaged.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">benchmark</span> <span class="p">(</span><span class="nv">f</span> <span class="k">&amp;optional</span> <span class="p">(</span><span class="nv">n</span> <span class="mi">100</span><span class="p">)</span> <span class="p">(</span><span class="nv">tags</span> <span class="o">'</span><span class="p">(</span><span class="nv">A</span> <span class="nv">B</span> <span class="nv">C</span> <span class="nv">D</span> <span class="nv">E</span> <span class="nv">F</span><span class="p">)))</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">state</span> <span class="p">(</span><span class="nv">copy-sequence</span> <span class="nv">[cl-random-state-tag</span> <span class="mi">-1</span> <span class="mi">30</span> <span class="nv">267466518]</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">db</span> <span class="p">(</span><span class="nv">make-random-database</span> <span class="ss">:state</span> <span class="nv">state</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="nv">n</span>
             <span class="nv">sum</span> <span class="p">(</span><span class="nv">measure-time</span>
                   <span class="p">(</span><span class="nb">funcall</span> <span class="nv">f</span> <span class="nv">db</span> <span class="nv">tags</span><span class="p">))</span>
             <span class="nv">into</span> <span class="nv">total</span>
             <span class="nv">finally</span> <span class="nb">return</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">total</span> <span class="p">(</span><span class="nb">float</span> <span class="nv">n</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The baseline will be <code class="language-plaintext highlighter-rouge">memq</code> (test for membership using identity,
<code class="language-plaintext highlighter-rouge">eq</code>). There are two lists of tags to compare: the list that is the
entry, and the list from the filter. This requires a nested loop for
each entry, one explicit (<code class="language-plaintext highlighter-rouge">cl-loop</code>) and one implicit (<code class="language-plaintext highlighter-rouge">memq</code>), both
with early bailout.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">memq-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">memq</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>Byte-code compiling everything and running the benchmark on my laptop
I get:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.041 seconds</span>
</code></pre></div></div>

<p>That’s actually not too bad. One of the advantages of this definition
is that there are no function calls. The <code class="language-plaintext highlighter-rouge">memq</code> built-in function has
its own opcode (62), and the rest of the definition is special forms
and macros expanding to special forms (<code class="language-plaintext highlighter-rouge">cl-loop</code>). It’s exactly the
thing I need to exploit to make filters faster.</p>

<p>As a sanity check, what would happen if I used <code class="language-plaintext highlighter-rouge">member</code> instead of
<code class="language-plaintext highlighter-rouge">memq</code>? In theory it should be slower because it uses <code class="language-plaintext highlighter-rouge">equal</code> for
tests instead of <code class="language-plaintext highlighter-rouge">eq</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">member-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nb">member</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s only slightly slower because <code class="language-plaintext highlighter-rouge">member</code>, <a href="/blog/2013/01/22/">like many other
built-ins</a>, also has an opcode (157). It’s just a tiny bit
more overhead.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">member-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.047 seconds</span>
</code></pre></div></div>

<p>To test function call overhead while still using the built-in (e.g.
written in C) <code class="language-plaintext highlighter-rouge">memq</code>, I’ll alias it so that the byte-code compiler is
forced to emit a function call.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'memq-alias</span> <span class="ss">'memq</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">memq-alias-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">memq-alias</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>To verify that this is doing what I expect, I <code class="language-plaintext highlighter-rouge">M-x disassemble</code> the
function and inspect the byte-code disassembly. Here’s a simple
example.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span>
 <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span> <span class="p">(</span><span class="nv">memq</span> <span class="ss">:foo</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>When compiled under lexical scope (<code class="language-plaintext highlighter-rouge">lexical-binding</code> is true), here’s
the disassembly. To understand what this means, see <a href="/blog/2014/01/04/"><em>Emacs Byte-code
Internals</em></a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  :foo
1       stack-ref 1
2       memq
3       return
</code></pre></div></div>

<p>Notice the <code class="language-plaintext highlighter-rouge">memq</code> instruction. Try using <code class="language-plaintext highlighter-rouge">memq-alias</code> instead:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span>
 <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span> <span class="p">(</span><span class="nv">memq-alias</span> <span class="ss">:foo</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>Resulting in a function call:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  memq-alias
1       constant  :foo
2       stack-ref 2
3       call      2
4       return
</code></pre></div></div>

<p>And the benchmark:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-alias-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.052 seconds</span>
</code></pre></div></div>

<p>So the function call adds about 27% overhead. This means it would be a
good idea to <strong>avoid calling functions in the filter</strong> if I can help
it. I should rely on these special opcodes.</p>

<p>Suppose <code class="language-plaintext highlighter-rouge">memq</code> was written in Emacs Lisp rather than C. How much would
that hurt performance? My version of <code class="language-plaintext highlighter-rouge">my-memq</code> below isn’t quite the
same since it returns t rather than the sublist, but it’s good enough
for this purpose. (I’m using <code class="language-plaintext highlighter-rouge">cl-loop</code> because writing early bailout
in plain Elisp without recursion is, in my opinion, ugly.)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-memq</span> <span class="p">(</span><span class="nv">needle</span> <span class="nv">haystack</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">element</span> <span class="nv">in</span> <span class="nv">haystack</span>
           <span class="nb">when</span> <span class="p">(</span><span class="nb">eq</span> <span class="nv">needle</span> <span class="nv">element</span><span class="p">)</span>
           <span class="nb">return</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">my-memq-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">my-memq</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>And the benchmark:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">my-memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.137 seconds</span>
</code></pre></div></div>

<p>Oof! It’s more than 3 times slower than the opcode. This means <strong>I
should use built-ins as much as possible</strong> in the filter.</p>

<h3 id="dynamic-vs-lexical-scope">Dynamic vs. lexical scope</h3>

<p>There’s one last thing to watch out for. Everything so far has been
compiled with lexical scope. You should really turn this on by default
for all new code that you write. It has three important advantages:</p>

<ol>
  <li>It allows the compiler to catch more mistakes.</li>
  <li>It eliminates a class of bugs related to dynamic scope: Local
variables are exposed to manipulation by callees.</li>
  <li><a href="/blog/2016/12/22/">Lexical scope has better performance</a>.</li>
</ol>

<p>Here are all the benchmarks with the default dynamic scope:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.065 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">member-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.070 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-alias-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.074 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">my-memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.256 seconds</span>
</code></pre></div></div>

<p>It halves the performance in this benchmark, and for no benefit. Under
dynamic scope, local variables use the <code class="language-plaintext highlighter-rouge">varref</code> opcode — a global
variable lookup — instead of the <code class="language-plaintext highlighter-rouge">stack-ref</code> opcode — a simple array
index.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">norm</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>Under dynamic scope, this compiles to:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       varref    a
1       varref    b
2       diff
3       varref    a
4       varref    b
5       diff
6       mult
7       return
</code></pre></div></div>

<p>And under lexical scope (notice the variable names disappear):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       stack-ref 1
1       stack-ref 1
2       diff
3       stack-ref 2
4       stack-ref 2
5       diff
6       mult
7       return
</code></pre></div></div>

<h3 id="jit-compiled-filters">JIT-compiled filters</h3>

<p>So far I’ve been moving in the wrong direction, making things slower
rather than faster. How can I make it faster than the straight <code class="language-plaintext highlighter-rouge">memq</code>
version? By compiling the filter into byte-code.</p>

<p>I won’t write the byte-code directly, but instead generate Elisp code
and use the byte-code compiler on it. This is safer, will work
correctly in future versions of Emacs, and leverages the optimizations
performed by the byte-compiler. This sort of thing recently <a href="http://emacshorrors.com/posts/when-data-becomes-code.html">got a bad
rap on Emacs Horrors</a>, but I was happy to see that this
technique is already established.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">jit-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">memq-list</span> <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                             <span class="nv">collect</span> <span class="o">`</span><span class="p">(</span><span class="nv">memq</span> <span class="ss">',tag</span> <span class="nv">entry</span><span class="p">)))</span>
         <span class="p">(</span><span class="k">function</span> <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
                      <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span>
                               <span class="nb">count</span> <span class="p">(</span><span class="nb">or</span> <span class="o">,@</span><span class="nv">memq-list</span><span class="p">))))</span>
         <span class="p">(</span><span class="nv">compiled</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="k">function</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">funcall</span> <span class="nv">compiled</span> <span class="nv">db</span><span class="p">)))</span>
</code></pre></div></div>

<p>It dynamically builds the code as an s-expression, runs that through
the byte-code compiler, executes it, and throws it away. It’s
“just-in-time,” though compiling to byte-code and not <a href="/blog/2015/03/19/">native
code</a>. For the benchmark tags of <code class="language-plaintext highlighter-rouge">(A B C D E F)</code>, this builds
the following:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span>
           <span class="nb">count</span> <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nv">memq</span> <span class="ss">'A</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'B</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'C</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'D</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'E</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'F</span> <span class="nv">entry</span><span class="p">))))</span>
</code></pre></div></div>

<p>Due to its short-circuiting behavior, <code class="language-plaintext highlighter-rouge">or</code> is a special form, so this
function is just special forms and <code class="language-plaintext highlighter-rouge">memq</code> in its opcode form. It’s as
fast as Elisp can get.</p>

<p>Having s-expressions is a real strength for lisp, since the
alternative (in, say, JavaScript) would be to assemble the function by
concatenating code strings. By contrast, this looks a lot like a
regular lisp macro. Invoking the byte-code compiler does add some
overhead compared to the interpreted filter, but it’s insignificant.</p>

<p>How much faster is this?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">jit-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.017s</span>
</code></pre></div></div>

<p><strong>It’s more than twice as fast!</strong> The big gain here is through <em>loop
unrolling</em>. The outer loop has been unrolled into the <code class="language-plaintext highlighter-rouge">or</code> expression.
That section of byte-code looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  A
1       stack-ref 1
2       memq
3       goto-if-not-nil-else-pop 1
6       constant  B
7       stack-ref 1
8       memq
9       goto-if-not-nil-else-pop 1
12      constant  C
13      stack-ref 1
14      memq
15      goto-if-not-nil-else-pop 1
18      constant  D
19      stack-ref 1
20      memq
21      goto-if-not-nil-else-pop 1
24      constant  E
25      stack-ref 1
26      memq
27      goto-if-not-nil-else-pop 1
30      constant  F
31      stack-ref 1
32      memq
33:1    return
</code></pre></div></div>

<p>In Elfeed, not only does it unroll these loops, it completely
eliminates the overhead for unused filter components. Comparing to
this benchmark, I’m seeing roughly matching gains in Elfeed’s worst
case. In Elfeed, I also bind <code class="language-plaintext highlighter-rouge">lexical-binding</code> around the
<code class="language-plaintext highlighter-rouge">byte-compile</code> call to force lexical scope, since otherwise it just
uses the buffer-local value (usually nil).</p>

<p>Filter compilation can be toggled on and off by setting
<code class="language-plaintext highlighter-rouge">elfeed-search-compile-filter</code>. If you’re up to date, try out live
filters with it both enabled and disabled. See if you can notice the
difference.</p>

<h3 id="result-summary">Result summary</h3>

<p>Here are the results in a table, all run with Emacs 24.4 on x86-64.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ms)      memq      member    memq-alias my-memq   jit
lexical   41        47        52         137       17
dynamic   65        70        74         256       21
</code></pre></div></div>

<p>And the same benchmarks on Aarch64 (Emacs 24.5, ARM Cortex-A53), where
I also occasionally use Elfeed, and where I have been very interested
in improving performance.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ms)      memq      member    memq-alias my-memq   jit
lexical   170       235       242        614       79
dynamic   274       340       345        1130      92
</code></pre></div></div>

<p>And here’s how you can run the benchmarks for yourself, perhaps with
different parameters:</p>

<ul>
  <li><a href="/download/jit-bench.el">jit-bench.el</a></li>
</ul>

<p>The header explains how to run the benchmark in batch mode:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ emacs -Q -batch -f batch-byte-compile jit-bench.el
$ emacs -Q -batch -l jit-bench.elc -f benchmark-batch
</code></pre></div></div>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>An Elfeed Database Analysis</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/08/12/"/>
    <id>urn:uuid:7f407aaa-229a-388c-ab7a-73e8ed24c04a</id>
    <updated>2016-08-12T03:20:16Z</updated>
    <category term="emacs"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>The end of the month marks <a href="/blog/2013/09/04/">Elfeed’s third birthday</a>. Surprising
to nobody, it’s also been three years of heavy, daily use by me. While
I’ve used Elfeed concurrently on a number of different machines over
this period, I’ve managed to keep an Elfeed <a href="/blog/2013/09/09/">database index</a>
with a lineage going all the way back to the initial development
stages, before the announcement. It’s a large, organically-grown
database that serves as a daily performance stress test. Hopefully
this means I’m one of the first people to have trouble if an invisible
threshold is ever exceeded.</p>

<p>I’m also the sort of person who gets excited when I come across an
interesting dataset, and I have this gem sitting right in front of me.
So a couple of days ago I pushed a new Elfeed function,
<code class="language-plaintext highlighter-rouge">elfeed-csv-export</code>, which exports a database index into three CSV
files. These are intended to serve as three tables in a SQL database,
exposing the database to interesting relational queries and joins.
Entry content (HTML, etc.) has always been considered volatile, so
this is not exported. The export function isn’t interactive (yet?), so
if you want to generate your own you’ll need to <code class="language-plaintext highlighter-rouge">(require
'elfeed-csv)</code> and evaluate it yourself.</p>

<p>All the source code for performing the analysis below on your own
database can be found here:</p>

<ul>
  <li><a href="https://github.com/skeeto/elfeed-analysis">https://github.com/skeeto/elfeed-analysis</a></li>
</ul>

<p>The three exported tables are <em>feeds</em>, <em>entries</em>, and <em>tags</em>. Here are
the corresponding columns (optional CSV header) for each:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>url, title, canonical-url, author
id, feed, title, link, date
entry, feed, tag
</code></pre></div></div>

<p>And here’s the SQLite schema I’m using for these tables:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">feeds</span> <span class="p">(</span>
    <span class="n">url</span> <span class="nb">TEXT</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span>
    <span class="n">title</span> <span class="nb">TEXT</span><span class="p">,</span>
    <span class="n">canonical_url</span> <span class="nb">TEXT</span><span class="p">,</span>
    <span class="n">author</span> <span class="nb">TEXT</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">entries</span> <span class="p">(</span>
    <span class="n">id</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">feed</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">REFERENCES</span> <span class="n">feeds</span> <span class="p">(</span><span class="n">url</span><span class="p">),</span>
    <span class="n">title</span> <span class="nb">TEXT</span><span class="p">,</span>
    <span class="n">link</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="nb">date</span> <span class="nb">REAL</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">feed</span><span class="p">)</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">tags</span> <span class="p">(</span>
    <span class="n">entry</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">feed</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">tag</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">FOREIGN</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">entry</span><span class="p">,</span> <span class="n">feed</span><span class="p">)</span> <span class="k">REFERENCES</span> <span class="n">entries</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">feed</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<p>Web authors are notoriously awful at picking actually-unique entry
IDs, even when <a href="/blog/2013/09/23/">using the smarter option</a>, Atom. I still simply
don’t trust that entry IDs are unique, so, as usual, I’ve qualified
them by their source feed URL, hence the primary key on both columns
in <code class="language-plaintext highlighter-rouge">entries</code>.</p>

<p>At this point I wish I had collected a lot more information. If I were
to start fresh today, Elfeed’s database schema would not only fully
match Atom’s schema, but also exceed it with additional logging:</p>

<ul>
  <li>When was each entry actually fetched?</li>
  <li>How did each entry change since the last fetch?</li>
  <li>When and for what reason did a feed fetch fail?</li>
  <li>When did an entry stop appearing in a feed?</li>
  <li>How long did fetching take?</li>
  <li>How long did parsing take?</li>
  <li>Which computer (hostname) performed the fetch?</li>
  <li>What interesting HTTP headers were included?</li>
  <li>Even if not kept for archival, how large was the content?</li>
</ul>

<p>I may start tracking some of these. If I don’t, I’ll be kicking myself
three years from now when I look at this again.</p>

<h3 id="a-look-at-my-index">A look at my index</h3>

<p>So just how big is my index? It’s <strong>25MB uncompressed</strong>, 2.5MB
compressed. I currently follow 117 feeds, but my index includes
<strong>43,821 entries</strong> from <strong>309 feeds</strong>. These entries are marked with
<strong>53,360 tags</strong> from a set of 35 unique tags. Some of these datapoints
are the result of temporarily debugging Elfeed issues and don’t
represent content that I actually follow. I’m more careful these days
to test in a temporary database as to avoid contamination. Some are
duplicates due to feeds changing URLs over the years. Some are
artifacts from old bugs. This all represents a bit of noise, but
should be negligible. During my analysis I noticed some of these
anomalies and took a moment to clean up obviously bogus data (weird
dates, etc.), all by adjusting tags.</p>

<p>The first thing I wanted to know is the weekday frequency. A number of
times I’ve blown entire Sundays working on Elfeed, and, as if to
frustrate my testing, it’s not unusual for several hours to pass
between new entries on Sundays. Is this just my perception or are
Sundays really that slow?</p>

<p>Here’s my query. I’m using SQLite’s <a href="https://www.sqlite.org/lang_datefunc.html">strftime</a> to shift the
result into my local time zone, Eastern Time. This time zone is the
source, or close to the source, of a large amount of the content. This
also automatically accounts for daylight savings time, which can’t be
done with a simple divide and subtract.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">cast</span><span class="p">(</span><span class="n">strftime</span><span class="p">(</span><span class="s1">'%w'</span><span class="p">,</span> <span class="nb">date</span><span class="p">,</span> <span class="s1">'unixepoch'</span><span class="p">,</span> <span class="s1">'localtime'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">INT</span><span class="p">)</span> <span class="k">AS</span> <span class="k">day</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="k">day</span><span class="p">;</span>
</code></pre></div></div>

<p>The most frequent tag (13,666 appearances) is “youtube”, which marks
every YouTube video, and I’ll use gnuplot to visualize it. The input
“file” is actually a command since gnuplot is poor at filtering data
itself, especially for histograms.</p>

<pre><code class="language-gnuplot">plot '&lt; grep ^youtube, weekdays.csv' using 2:3 with boxes
</code></pre>

<p><a href="/img/elfeed-graphs/weekdays-youtube.png"><img src="/img/elfeed-graphs/weekdays-youtube-thumb.png" alt="" /></a></p>

<p>Wow, things <em>do</em> quiet down dramatically on weekends! From the
glass-half-full perspective, this gives me a chance to catch up when I
inevitably fall behind on these videos during the week.</p>

<p>The same is basically true for other types of content, including
“comic” (12,465 entries) and “blog” (7,505 entries).</p>

<p><a href="/img/elfeed-graphs/weekdays-comic.png"><img src="/img/elfeed-graphs/weekdays-comic-thumb.png" alt="" /></a></p>

<p><a href="/img/elfeed-graphs/weekdays-blog.png"><img src="/img/elfeed-graphs/weekdays-blog-thumb.png" alt="" /></a></p>

<p>However, “emacs” (2,404 entries) is a different story. It doesn’t slow
down on the weekend, but Emacs users sure love to talk about Emacs on
Mondays. In my own index, this spike largely comes from <a href="http://planet.emacsen.org/">Planet
Emacsen</a>. Initially I thought maybe this was an artifact of
Planet Emacsen’s date handling — i.e. perhaps it does a big fetch on
Mondays and groups up the dates — but I double checked: they pass the
date directly through from the original articles.</p>

<p>Conclusion: Emacs users love Mondays. Or maybe they hate Mondays and
talk about Emacs as an escape.</p>

<p><a href="/img/elfeed-graphs/weekdays-emacs.png"><img src="/img/elfeed-graphs/weekdays-emacs-thumb.png" alt="" /></a></p>

<p>I can reuse the same query to look at different time scales. When
during the day do entries appear? Adjusting the time zone here becomes
a lot more important.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">cast</span><span class="p">(</span><span class="n">strftime</span><span class="p">(</span><span class="s1">'%H'</span><span class="p">,</span> <span class="nb">date</span><span class="p">,</span> <span class="s1">'unixepoch'</span><span class="p">,</span> <span class="s1">'localtime'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">INT</span><span class="p">)</span> <span class="k">AS</span> <span class="n">hour</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="n">hour</span><span class="p">;</span>
</code></pre></div></div>

<p>Emacs bloggers tend to follow a nice Eastern Time sleeping schedule.
(I wonder how Vim bloggers compare, since, as an Emacs user, I
naturally assume Vim users’ schedules are as undisciplined as their
bathing habits.) However, this also <a href="http://irreal.org/blog/">might be prolific the
Irreal</a> breaking the curve.</p>

<p><a href="/img/elfeed-graphs/hours-emacs.png"><img src="/img/elfeed-graphs/hours-emacs-thumb.png" alt="" /></a></p>

<p>The YouTube channels I follow are a bit more erratic, but there’s
still a big drop in the early morning and a spike in the early
afternoon. It’s unclear if the timestamp published in the feed is the
upload time or the publication time. This would make a difference in
the result (e.g. overnight video uploads).</p>

<p><a href="/img/elfeed-graphs/hours-youtube.png"><img src="/img/elfeed-graphs/hours-youtube-thumb.png" alt="" /></a></p>

<p>Do you suppose there’s a slow <em>month</em>?</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">cast</span><span class="p">(</span><span class="n">strftime</span><span class="p">(</span><span class="s1">'%m'</span><span class="p">,</span> <span class="nb">date</span><span class="p">,</span> <span class="s1">'unixepoch'</span><span class="p">,</span> <span class="s1">'localtime'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">INT</span><span class="p">)</span> <span class="k">AS</span> <span class="k">day</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="k">day</span><span class="p">;</span>
</code></pre></div></div>

<p>December is a big drop across all tags, probably for the holidays.
Both “comic” and “blog” also have an interesting drop in August. For
brevity, I’ll only show one. This might be partially due my not
waiting until the end of this month for this analysis, since there are
only 2.5 Augusts in my 3-year dataset.</p>

<p><a href="/img/elfeed-graphs/months-comic.png"><img src="/img/elfeed-graphs/months-comic-thumb.png" alt="" /></a></p>

<p>Unfortunately the timestamp is the only direct <em>numerical</em> quantity in
the data. So far I’ve been binning data points and counting to get a
second numerical quantity. Everything else is text, so I’ll need to
get more creative to find other interesting relationships.</p>

<p>So let’s have a look a the lengths of entry titles.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">length</span><span class="p">(</span><span class="n">title</span><span class="p">)</span> <span class="k">AS</span> <span class="k">length</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="k">length</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="k">length</span><span class="p">;</span>
</code></pre></div></div>

<p>The shortest are the webcomics. I’ve <a href="/blog/2015/09/26/">complained about poor webcomic
titles before</a>, so this isn’t surprising. The spikes are from
comics that follow a strict (uncreative) title format.</p>

<p><a href="/img/elfeed-graphs/lengths-comic.png"><img src="/img/elfeed-graphs/lengths-comic-thumb.png" alt="" /></a></p>

<p>Emacs article titles follow a nice distribution. You can tell these
are programmers because so many titles are exactly 32 characters long.
Picking this number is such a natural instinct that we aren’t even
aware of it. Or maybe all their database schemas have <code class="language-plaintext highlighter-rouge">VARCHAR(32)</code>
title columns?</p>

<p><a href="/img/elfeed-graphs/lengths-emacs.png"><img src="/img/elfeed-graphs/lengths-emacs-thumb.png" alt="" /></a></p>

<p>Blogs in general follow a nice distribution. The big spike is from the
<a href="http://www.bay12games.com/dwarves/index.html">Dwarf Fortress development blog</a>, which follows a strict date
format.</p>

<p><a href="/img/elfeed-graphs/lengths-blog.png"><img src="/img/elfeed-graphs/lengths-blog-thumb.png" alt="" /></a></p>

<p>The longest on average are YouTube videos. This is largely due to the
kinds of videos I watch (“Let’s Play” videos), which tend to have
long, predictable names.</p>

<p><a href="/img/elfeed-graphs/lengths-youtube.png"><img src="/img/elfeed-graphs/lengths-youtube-thumb.png" alt="" /></a></p>

<p>And finally, here’s the most interesting-looking graph of them all.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="p">((</span><span class="nb">date</span> <span class="o">-</span> <span class="mi">4</span><span class="o">*</span><span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">)</span> <span class="o">%</span> <span class="p">(</span><span class="mi">24</span><span class="o">*</span><span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">)</span> <span class="k">AS</span> <span class="n">day_time</span><span class="p">,</span>
       <span class="k">length</span><span class="p">(</span><span class="n">title</span><span class="p">)</span> <span class="k">AS</span> <span class="k">length</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span><span class="p">;</span>
</code></pre></div></div>

<p>This is the title length versus time of day (not binned). Each point
is one of the 53,360 posts.</p>

<pre><code class="language-gnuplot">set style fill transparent solid 0.25 noborder
set style circle radius 0.04
plot 'length-vs-daytime.csv' using 1:2 with circles
</code></pre>

<p>(This is a good one to follow through to the full size image.)</p>

<p><a href="/img/elfeed-graphs/length-vs-daytime.png"><img src="/img/elfeed-graphs/length-vs-daytime-thumb.png" alt="" /></a></p>

<p>Again, all Eastern Time since I’m self-centered like that. Vertical
lines are authors rounding their post dates to the hour. Horizontal
lines are the length spikes from above, such as the line of entries at
title length 10 in the evening (Dwarf Fortress blog). There’s a the
mid-day cloud of entries of various title lengths, with the shortest
title cloud around mid-morning. That’s probably when many of the
webcomics come up.</p>

<p>Additional analysis could look further at textual content, beyond
simply length, in some quantitative way (n-grams? soundex?). But
mostly I really need to keep track of more data!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Elfeed, cURL, and You</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/06/16/"/>
    <id>urn:uuid:76942398-f693-3127-fd45-19d508b5c044</id>
    <updated>2016-06-16T18:22:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>This morning I pushed out an important update to <a href="https://github.com/skeeto/elfeed">Elfeed</a>, my
web feed reader for Emacs. The update should be available in MELPA by
the time you read this. Elfeed now has support for fetching feeds
using a <a href="https://curl.haxx.se/">cURL</a> through a <code class="language-plaintext highlighter-rouge">curl</code> inferior process. You’ll need
the program in your PATH or configured through
<code class="language-plaintext highlighter-rouge">elfeed-curl-program-name</code>.</p>

<p>I’ve been using it for a couple of days now, but, while I work out the
remaining kinks, it’s disabled by default. So in addition to having
cURL installed, you’ll need to set <code class="language-plaintext highlighter-rouge">elfeed-use-curl</code> to non-nil.
Sometime soon it will be enabled by default whenever cURL is
available. The original <code class="language-plaintext highlighter-rouge">url-retrieve</code> fetcher will remain in place
for time time being. However, cURL <em>may</em> become a requirement someday.</p>

<p>Fetching with a <code class="language-plaintext highlighter-rouge">curl</code> inferior process has some huge advantages.</p>

<h3 id="its-much-faster">It’s much faster</h3>

<p>The most obvious change is that you should experience a huge speedup
on updates and better responsiveness during updates after the first
cURL run. There are important two reasons:</p>

<p><strong>Asynchronous DNS and TCP</strong>: Emacs 24 and earlier performs DNS
queries synchronously even for asynchronous network processes. This is
being fixed on some platforms (including Linux) in Emacs 25, but now
we don’t have to wait.</p>

<p>On Windows it’s even worse: the TCP connection is also established
synchronously. This is especially bad when fetching relatively small
items such as feeds, because the DNS look-up and TCP handshake dominate
the overall fetch time. It essentially makes the whole process
synchronous.</p>

<p><strong>Conditional GET</strong>: HTTP has two mechanism to avoid transmitting
information that a client has previously fetched. One is the
Last-Modified header delivered by the server with the content. When
querying again later, the client echos the date back <a href="https://utcc.utoronto.ca/~cks/space/blog/web/IfModifiedSinceHowNot">like a
token</a> in the If-Modified-Since header.</p>

<p>The second is the “entity tag,” an arbitrary server-selected token
associated with each version of the content. The server delivers it
along with the content in the ETag header, and the client hands it
back later in the If-None-Match header, sort of like a cookie.</p>

<p>This is highly valuable for feeds because, unless the feed is
particularly active, most of the time the feed hasn’t been updated
since the last query. This avoids sending anything other hand a
handful of headers each way. In Elfeed’s case, it means <strong>it doesn’t
have to parse the same XML over and over again</strong>.</p>

<p>Both of these being outside of cURL’s scope, Elfeed has to manage
conditional GET itself. I had no control over the HTTP headers until
now, so I couldn’t take advantage of it. Emacs’ <code class="language-plaintext highlighter-rouge">url-retrieve</code>
function allows for sending custom headers through dynamically binding
<code class="language-plaintext highlighter-rouge">url-request-extra-headers</code>, but this isn’t available when calling
<code class="language-plaintext highlighter-rouge">url-queue-retrieve</code> since the request itself is created
asynchronously.</p>

<p>Both the ETag and Last-Modified values are stored in the database and
persist across sessions. This is the reason the full speedup isn’t
realized until the second fetch. The initial cURL fetch doesn’t have
these values.</p>

<h3 id="fewer-bugs">Fewer bugs</h3>

<p>As mentioned previously, Emacs has a built-in URL retrieval library
called <code class="language-plaintext highlighter-rouge">url</code>. The central function is <code class="language-plaintext highlighter-rouge">url-retrieve</code> which
asynchronously fetches the content at an arbitrary URL (usually HTTP)
and delivers the buffer and status to a callback when it’s ready.
There’s also a queue front-end for it, <code class="language-plaintext highlighter-rouge">url-queue-retrieve</code> which
limits the number of parallel connections. Elfeed hands this function
a pile of feed URLs all at once and it fetches them N at a time.</p>

<p>Unfortunately both these functions are <em>incredibly</em> buggy. It’s been a
thorn in my side for years.</p>

<p>Here’s what the interface looks like for both:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">url-retrieve</span> <span class="nv">URL</span> <span class="nv">CALLBACK</span> <span class="k">&amp;optional</span> <span class="nv">CBARGS</span> <span class="nv">SILENT</span> <span class="nv">INHIBIT-COOKIES</span><span class="p">)</span>
</code></pre></div></div>

<p>It takes a URL and a callback. Seeing this, the sane, unsurprising
expectation is the callback will be invoked <em>exactly once</em> for time
<code class="language-plaintext highlighter-rouge">url-retrieve</code> was called. In any case where the request fails, it
should report it through the callback. <a href="http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20159">This is not the case</a>.
The callback may be invoked any number of times, <em>including zero</em>.</p>

<p>In this example, suppose you have a webserver that will return an HTTP
404 for a requested URL. Below, I fire off 10 asynchronous requests in a
row.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://127.0.0.1:8080/404"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>What would you guess is the length of <code class="language-plaintext highlighter-rouge">results</code>? It’s initially 0
before any requests complete and over time (a very short time) I would
expect this to top out at 10. On Emacs 24, here’s the real answer:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">length</span> <span class="nv">results</span><span class="p">)</span>
<span class="c1">;; =&gt; 46</span>
</code></pre></div></div>

<p>The same error is reported multiple times to the callback. At least
the pattern is obvious.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-count</span> <span class="mi">0</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 9</span>
<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">1</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 8</span>
<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">2</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 7</span>

<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">9</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 1</span>
</code></pre></div></div>

<p>Here’s another one, this time to the non-existent foo.example. The DNS
query should never resolve.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://foo.example/"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>What’s the length of <code class="language-plaintext highlighter-rouge">results</code>? This time it’s zero. Remember how DNS
is synchronous? Because of this, DNS failures are reported
synchronously as a signaled error. This gets a lot worse with
<code class="language-plaintext highlighter-rouge">url-queue-retrieve</code>. Since the request is put off until later, DNS
doesn’t fail until later, and you get neither a callback nor an error
signal. This also puts the queue in a bad state and necessitated
<code class="language-plaintext highlighter-rouge">elfeed-unjam</code> for manually clear it. This one should get fixed in
Emacs 25 when DNS is asynchronous.</p>

<p>This last one assumes you don’t have anything listening on port 57432
(pulled out of nowhere) so that the connection fails.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://127.0.0.1:57432/"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>On Linux, we finally get the sane result of 10. However, on Windows,
it’s zero. The synchronous TCP connection will fail, signaling an
error just like DNS failures. Not only is it broken, it’s broken in
different ways on different platforms.</p>

<p>There are many more cases of callback weirdness which depend on the
connection and HTTP session being in various states when thing go
awry. These were just the easiest to demonstrate. By using cURL, I get
to bypass this mess.</p>

<h3 id="no-more-gnutls-issues">No more GnuTLS issues</h3>

<p>At compile time, Emacs can optionally be linked against GnuTLS, giving
it robust TLS support so long as the shared library is available.
<code class="language-plaintext highlighter-rouge">url-retrieve</code> uses this for fetching HTTPS content. Unfortunately,
this library is noisy and will occasionally echo non-informational
messages in the minibuffer and in <code class="language-plaintext highlighter-rouge">*Messages*</code> that cannot be
suppressed.</p>

<p>When not linked against GnuTLS, Emacs will instead run the GnuTLS
command line program as an inferior process, just like Elfeed now does
with cURL. Unfortunately this interface is very slow and frequently
fails, basically preventing Elfeed from fetching HTTPS feeds. I
suspect it’s in part due to an improper <code class="language-plaintext highlighter-rouge">coding-system-for-read</code>.</p>

<p>cURL handles all the TLS negotation itself, so both these problems
disappear. The compile-time configuration doesn’t matter.</p>

<h3 id="windows-is-now-supported">Windows is now supported</h3>

<p>Emacs’ Windows networking code is so unstable, even in Emacs 25, that
I couldn’t make any practical use of Elfeed on that platform. Even the
Cygwin emacs-w32 version couldn’t cut it. It hard crashes Emacs every
time I’ve tried to fetch feeds. Fortunately the inferior process code
is a whole lot more stable, meaning fetching with cURL works great. As
of today, you can now use Elfeed on Windows. The biggest obstable is
getting cURL installed and configured.</p>

<h3 id="interface-changes">Interface changes</h3>

<p>With cURL, obviously the values of <code class="language-plaintext highlighter-rouge">url-queue-timeout</code> and
<code class="language-plaintext highlighter-rouge">url-queue-parallel-processes</code> no longer have any meaning to Elfeed.
If you set these for yourself, you should instead call the functions
<code class="language-plaintext highlighter-rouge">elfeed-set-timeout</code> and <code class="language-plaintext highlighter-rouge">elfeed-set-max-connections</code>, which will do
the appropriate thing depending on the value of <code class="language-plaintext highlighter-rouge">elfeed-use-curl</code>.
Each also comes with a getter so you can query the current value.</p>

<p>The deprecated <code class="language-plaintext highlighter-rouge">elfeed-max-connections</code> has been removed.</p>

<p>Feed objects now have meta tags <code class="language-plaintext highlighter-rouge">:etag</code>, <code class="language-plaintext highlighter-rouge">:last-modified</code>, and
<code class="language-plaintext highlighter-rouge">:canonical-url</code>. The latter can identify feeds that have been moved,
though it needs a real UI.</p>

<h3 id="see-any-bugs">See any bugs?</h3>

<p>If you use Elfeed, grab the current update and give the cURL fetcher a
shot. Please open a ticket if you find problems. Be sure to report
your Emacs version, operating system, and cURL version.</p>

<p>As of this writing there’s just one thing missing compared to
url-queue: connection reuse. cURL supports it, so I just need to code
it up.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>9 Elfeed Features You Might Not Know</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/12/03/"/>
    <id>urn:uuid:26807fd8-4b69-3caa-552a-90308cc0b24f</id>
    <updated>2015-12-03T22:33:17Z</updated>
    <category term="emacs"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>It’s been two years since <a href="/blog/2013/11/26/">I last wrote about Elfeed</a>, my
<a href="https://github.com/skeeto/elfeed">Atom/RSS feed reader for Emacs</a>. I’ve used it every single
day since, and I continue to maintain it with help from the community.
So far 18 people besides me have contributed commits. Over the last
couple of years it’s accumulated some new features, some more obvious
than others.</p>

<p>Every time I mark a new release, I update the ChangeLog at the top of
elfeed.el which lists what’s new. Since it’s easy to overlook many of
the newer useful features, I thought I’d list the more important ones
here.</p>

<h4 id="custom-entry-colors">Custom Entry Colors</h4>

<p>You can now customize entry faces through <code class="language-plaintext highlighter-rouge">elfeed-search-face-alist</code>.
This variable maps tags to faces. An entry inherits the face of any
tag it carries. Previously “unread” was a special tag that got a bold
face, but this is now implemented as nothing more than an initial
entry in the alist.</p>

<p><a href="/img/elfeed/colors.png"><img src="/img/elfeed/colors-thumb.png" alt="" /></a></p>

<p>I’ve been using it to mark different kinds of content (videos,
podcasts, comics) with different colors.</p>

<h4 id="autotagging">Autotagging</h4>

<p>You can specify the starting tags for entries from particular feeds
directly in the feed listing. This has been a feature for awhile now,
but it’s not something you’d want to miss. It started out as a feature
in my personal configuration that eventually migrated into Elfeed
proper.</p>

<p>For example, your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code> may initially look like this,
especially if you imported from OPML.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="s">"https://nullprogram.com/feed/"</span>
 <span class="s">"http://nedroid.com/feed/"</span>
 <span class="s">"https://www.youtube.com/feeds/videos.xml?user=quill18"</span><span class="p">)</span>
</code></pre></div></div>

<p>If you wanted certain tags applied to entries from each, you would
need to putz around with <code class="language-plaintext highlighter-rouge">elfeed-make-tagger</code>. For the most common
case — apply certain tags to all entries from a URL — it’s much
simpler to specify the information as part of the listing itself,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">((</span><span class="s">"https://nullprogram.com/feed/"</span> <span class="nv">blog</span> <span class="nv">emacs</span><span class="p">)</span>
 <span class="p">(</span><span class="s">"http://nedroid.com/feed/"</span> <span class="nv">webcomic</span><span class="p">)</span>
 <span class="p">(</span><span class="s">"https://www.youtube.com/feeds/videos.xml?user=quill18"</span> <span class="nv">youtube</span><span class="p">))</span>
</code></pre></div></div>

<p>Today I only use custom tagger functions in my own configuration to
filter within a couple of particularly noisy feeds.</p>

<h4 id="arbitrary-metadata">Arbitrary Metadata</h4>

<p>Metadata is more for Elfeed extensions (i.e. <a href="https://github.com/remyhonig/elfeed-org">elfeed-org</a>)
than regular users. You can attach arbitrary, <a href="/blog/2013/12/30/">readable</a>
metadata to any Elfeed object (entry, feed). This metadata is
automatically stored in the database. It’s a plist.</p>

<p>Metadata is accessed entirely through one setf-able function:
<code class="language-plaintext highlighter-rouge">elfeed-meta</code>. For example, you might want to track <em>when</em> you’ve read
something, not just that you’ve read it. You could use this to
selectively update certain feeds or just to evaluate your own habits.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-elfeed-mark-read</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">elfeed-untag</span> <span class="nv">entry</span> <span class="ss">'unread</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">date</span> <span class="p">(</span><span class="nv">format-time-string</span> <span class="s">"%FT%T%z"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-meta</span> <span class="nv">entry</span> <span class="ss">:read-date</span><span class="p">)</span> <span class="nv">date</span><span class="p">)))</span>
</code></pre></div></div>

<p>Two things motivated this feature. First, without a plist, if I added
more properties in the future, I would need to change the database
format to support them. I modified the database format to add
metadata, requiring an upgrade function to quietly upgrade older
databases as they were loaded. I’d really like to avoid this in the
future.</p>

<p>Second, I wanted to make it easy for extension authors to store their
own data. I still imagine an extension someday to update feeds
intelligently based on their history. For example, the database
doesn’t track when the feed was last fetched, just the date of the
most recent entry (if any). A smart-update extension could use
metadata to tag feeds with this information.</p>

<p>Elfeed itself already uses two metadata keys: <code class="language-plaintext highlighter-rouge">:failures</code> on feeds and
<code class="language-plaintext highlighter-rouge">:title</code> on both. <code class="language-plaintext highlighter-rouge">:failures</code> counts the total number of times
fetching that feed resulted in an error. You could use this get a
listing of troublesome feeds like so,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">url</span> <span class="nv">in</span> <span class="p">(</span><span class="nv">elfeed-feed-list</span><span class="p">)</span>
         <span class="nv">for</span> <span class="nv">feed</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="nv">url</span><span class="p">)</span>
         <span class="nv">for</span> <span class="nv">failures</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">elfeed-meta</span> <span class="nv">feed</span> <span class="ss">:failures</span><span class="p">)</span>
         <span class="nb">when</span> <span class="nv">failures</span>
         <span class="nv">collect</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">url</span> <span class="nv">failures</span><span class="p">))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">:title</code> property allows for a custom title for both feeds and
entries in the search buffer listing, assuming you’re using the
default function (see below). It overrides the title provided by the
feed itself. This is different than <code class="language-plaintext highlighter-rouge">elfeed-entry-title</code> and
<code class="language-plaintext highlighter-rouge">elfeed-feed-title</code>, which is kept in sync with feed content. Metadata
is not kept in sync with the feed itself.</p>

<h4 id="filter-inversion">Filter Inversion</h4>

<p>You can invert filter components by prefixing them with <code class="language-plaintext highlighter-rouge">!</code>. For
example, say you’re looking at all my posts from the past 6 months:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@6-months nullprogram.com
</code></pre></div></div>

<p>But say you’re tired of me and decide you want to see every entry from
the past 6 months <em>excluding</em> my posts.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@6-months !nullprogram.com
</code></pre></div></div>

<h4 id="filter-limiter">Filter Limiter</h4>

<p>Normally you limit the number of results by date, but you can now
limit the result by count using <code class="language-plaintext highlighter-rouge">#n</code>. For example, to see my most
recent 12 posts regardless of date,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nullprogram.com #12
</code></pre></div></div>

<p>This is used internally in the live filter to limit the number of
results to the height of the screen. If you noticed that live
filtering has been much more responsive in the last few months, this is
probably why.</p>

<h4 id="bookmark-support">Bookmark Support</h4>

<p>Elfeed properly integrates with Emacs’ bookmarks (<a href="https://github.com/skeeto/elfeed/issues/110">thanks to
groks</a>). You can bookmark the current filter with <code class="language-plaintext highlighter-rouge">M-x
bookmark-set</code> (<code class="language-plaintext highlighter-rouge">C-x r m</code>). By default, Emacs will persist bookmarks
between sessions. To revisit a filter in the future, <code class="language-plaintext highlighter-rouge">M-x
bookmark-jump</code> (<code class="language-plaintext highlighter-rouge">C-x r b</code>).</p>

<p>Since this requires no configuration, this may serve as an easy
replacement for manually building “view” toggles — filters bound to
certain keys — which I know many users have done, including me.</p>

<h4 id="new-header">New Header</h4>

<p>If you’ve updated very recently, you probably noticed Elfeed got a
brand new header. Previously it faked a header by writing to the first
line of the buffer. This is because somehow I had no idea Emacs had
official support for buffer headers (despite notmuch using them all
this time).</p>

<p>The new header includes additional information, such as the current
filter, the number of unread entries, the total number of entries, and
the number of unique feeds currently in view. You’ll see this as
<code class="language-plaintext highlighter-rouge">&lt;unread&gt;/&lt;total&gt;:&lt;feeds&gt;</code> in the middle of the header.</p>

<p>As of this writing, the new header has not been made part of a formal
release. So if you’re only tracking stable releases, you won’t see
this for awhile longer.</p>

<p>You can supply your own header via <code class="language-plaintext highlighter-rouge">elfeed-search-header-function</code>
(<a href="https://github.com/skeeto/elfeed/issues/111">thanks to Gergely Nagy</a>).</p>

<h4 id="scoped-updates">Scoped Updates</h4>

<p>As you already know, in the search buffer listing you can press <code class="language-plaintext highlighter-rouge">G</code> to
update your feeds. But did you know you it takes a prefix argument?
Run as <code class="language-plaintext highlighter-rouge">C-u G</code>, it only updates feeds with entries currently listed in
the buffer.</p>

<p>As of this writing, this is another feature not yet in a formal
release. I’d been wanting something like this for awhile but couldn’t
think of a reasonable interface. Directly prompting the user for feeds
is neither elegant nor composable. However, groks <a href="https://github.com/skeeto/elfeed/issues/109">suggested the
prefix argument</a>, which composes perfectly with Elfeed’s
existing idioms.</p>

<h4 id="listing-customizations">Listing Customizations</h4>

<p>In addition to custom faces, there are a number of ways to customize
the listing.</p>

<ul>
  <li>Choose the sort order with <code class="language-plaintext highlighter-rouge">elfeed-sort-order</code>.</li>
  <li>Set a custom date format with <code class="language-plaintext highlighter-rouge">elfeed-search-date-format</code>.</li>
  <li>Adjust field widths with <code class="language-plaintext highlighter-rouge">elfeed-search-*-width</code>.</li>
  <li>Or override everything with <code class="language-plaintext highlighter-rouge">elfeed-search-print-entry-function</code>.</li>
</ul>

<p>Gergely Nagy has been throwing lots of commits at me over the last
couple of weeks to open up lots of Elfeed’s behavior to customization,
so there are more to come.</p>

<h3 id="thank-you-emacs-community">Thank You, Emacs Community</h3>

<p>Apologies about any features I missed or anyone I forgot to mention
who’s made contributions. The above comes from my ChangeLogs, the
commit log, the GitHub issue listing, and my own memory, so I’m likely
to have forgotten some things. A couple of these features I had
forgotten about myself!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Elfeed Tips and Tricks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/11/26/"/>
    <id>urn:uuid:45fbc221-dbea-302c-22c0-ec0527421ed8</id>
    <updated>2013-11-26T00:38:20Z</updated>
    <category term="elfeed"/><category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>This past weekend I had some questions from next-user-here (NUH) on my
<a href="/blog/2013/09/04/">original Elfeed post</a> about changing some of Elfeed’s
behavior. NUH is an Elisp novice so accomplishing some of the
requested modifications wasn’t obvious. A novice is mostly limited to
setting variables, not defining advice or using hooks. I’ve also been
using Elfeed daily for about three months now as my sole web feed
reader and along the way I’ve developed some best practices. In
addition to responding to some of NIH’s questions here, I’d like to
share some tips and tricks.</p>

<h3 id="custom-entry-launchers">Custom Entry Launchers</h3>

<p>Currently you can press “b” to launch one or more entries in your
browser. You can use “y” to copy an single entry to the clipboard.
What if you want to make another action.</p>

<p>In my configuration I have a fancy binding that sends the entry URLs
in the selected region to <a href="http://rg3.github.io/youtube-dl/">youtube-dl</a> for downloading the
videos. It’s too large to share as a snippet so here’s a small example
of something similar using a program called <code class="language-plaintext highlighter-rouge">xcowsay</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">xcowsay</span> <span class="p">(</span><span class="nv">message</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">call-process</span> <span class="s">"xcowsay"</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">message</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">elfeed-xcowsay</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">entry</span> <span class="p">(</span><span class="nv">elfeed-search-selected</span> <span class="ss">:single</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">xcowsay</span> <span class="p">(</span><span class="nv">elfeed-entry-title</span> <span class="nv">entry</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">define-key</span> <span class="nv">elfeed-search-mode-map</span> <span class="s">"x"</span> <span class="nf">#'</span><span class="nv">elfeed-xcowsay</span><span class="p">)</span>
</code></pre></div></div>

<p>Now when I hit “x” over an entry in Elfeed I’m greeted by a cow
announcing the title.</p>

<p><img src="/img/screenshot/xcowsay-small.png" alt="" /></p>

<h3 id="entry-listing-customization">Entry Listing Customization</h3>

<p>The <em>search</em> buffer you see when starting Elfeed, where entries are
listed, can be customized a few different ways. First, this buffer
<em>does</em> grow dynamically. After re-sizing the window/frame horizontally
you just have to refresh the view by pressing <code class="language-plaintext highlighter-rouge">g</code> (an Emacs
convention). How it fills out depends on the settings of these
variables,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-max-width</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-min-width</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-trailing-width</code></li>
</ul>

<p>They control how wide the different columns should be as the window
size changes. An important caveat to this is that the cache stored in
<code class="language-plaintext highlighter-rouge">elfeed-search-cache</code> <em>must</em> be cleared before the changes will be
reflected in the display. This cache exists because building the
display, assembling all the special faces, is actually quite
CPU-intensive. It was an optimization I established early on.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">clrhash</span> <span class="nv">elfeed-search-cache</span><span class="p">)</span>
</code></pre></div></div>

<p>If you set these variables in your start-up configuration you don’t
need to worry about clearing the cache because it will already be
empty. It’s only a concern when playing with the settings.</p>

<h4 id="date-display">Date Display</h4>

<p>Another question was about adding time to the entry listing. Elfeed
only displays the entry’s date. Dates are formatted by the function
<code class="language-plaintext highlighter-rouge">elfeed-search-format-date</code>. This can be redefined to display dates
differently.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">elfeed-search-format-date</span> <span class="p">(</span><span class="nv">date</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">format-time-string</span> <span class="s">"%Y-%m-%d %H:%M"</span> <span class="p">(</span><span class="nv">seconds-to-time</span> <span class="nv">date</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s given epoch seconds as a float and it returns a string to display
as a date.</p>

<h4 id="faces-and-colors">Faces and Colors</h4>

<p>All of the faces used in the display are declared for customization,
so these can be changed to whatever you like.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-date-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-feed-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-tag-face</code></li>
</ul>

<p>Say you suffered a head injury and decided you want your Elfeed dates
to be bold, purple, and underlined,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">custom-set-faces</span>
 <span class="o">'</span><span class="p">(</span><span class="nv">elfeed-search-date-face</span>
   <span class="p">((</span><span class="no">t</span> <span class="ss">:foreground</span> <span class="s">"#f0f"</span>
       <span class="ss">:weight</span> <span class="nv">extra-bold</span>
       <span class="ss">:underline</span> <span class="no">t</span><span class="p">))))</span>
</code></pre></div></div>

<h3 id="database-manipulation">Database Manipulation</h3>

<p>Feeds and entries in the database can be manipulated to become
whatever you want them to be. Because Elfeed is regularly modifying
the database, the trick is to perform the manipulation at <em>just</em> the
right time.</p>

<h4 id="feed-title-changes">Feed Title Changes</h4>

<p>Say you want to change a feed title because you don’t like the title
supplied by the feed. For example, the title to my blog’s feed is
“null program” but instead you think it should be “Seriously Handsome
Programmer” (head injury, remember?). The function
<code class="language-plaintext highlighter-rouge">elfeed-db-get-feed</code> can be used to fetch a feed’s data structure from
the database, given it’s exact URL as listed in your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">feed</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="s">"https://nullprogram.com/feed/"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-feed-title</span> <span class="nv">feed</span><span class="p">)</span> <span class="s">"Seriously Handsome Programmer"</span><span class="p">))</span>
</code></pre></div></div>

<p>Hold it, that didn’t work. First, that display cache is getting in the
way again. Feed titles change very infrequently so they’re cached
aggressively. More importantly, next time you update your feeds Elfeed
will re-synchronize the feed title with the official title. It’s going
to fight against your intervention.</p>

<p>The solution is to do it with a little bit of advice just before the
title is displayed. Advise the function <code class="language-plaintext highlighter-rouge">elfeed-search-update</code> with
some “before” advice.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">elfeed-search-update</span> <span class="p">(</span><span class="nv">before</span> <span class="nv">nullprogram</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">feed</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="s">"https://nullprogram.com/feed/"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-feed-title</span> <span class="nv">feed</span><span class="p">)</span> <span class="s">"Seriously Handsome Programmer"</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="entry-tweaking">Entry Tweaking</h4>

<p>Automatic entry modification should happen immediately upon discovery
so that it looks like the entry arrived that way. This is done through
the <code class="language-plaintext highlighter-rouge">elfeed-new-entry-hook</code>. Generally this would be used for applying
custom tags. These examples are from the documentation:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Mark all YouTube entries</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"youtube\\.com"</span>
                              <span class="ss">:add</span> <span class="o">'</span><span class="p">(</span><span class="nv">video</span> <span class="nv">youtube</span><span class="p">)))</span>

<span class="c1">;; Entries older than 2 weeks are marked as read</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:before</span> <span class="s">"2 weeks ago"</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>

<span class="c1">;; Building subset feeds</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"example\\.com"</span>
                              <span class="ss">:entry-title</span> <span class="o">'</span><span class="p">(</span><span class="nb">not</span> <span class="s">"something interesting"</span><span class="p">)</span>
                              <span class="ss">:add</span> <span class="ss">'junk</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>
</code></pre></div></div>

<p>Due to a feature I recently ported from my personal configuration,
this tagger helper function is less necessary. You can put lists in
your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code> list to supply automatic tags.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">setq</span> <span class="nv">elfeed-feeds</span>
      <span class="o">'</span><span class="p">((</span><span class="s">"https://nullprogram.com/feed/"</span> <span class="nv">blog</span> <span class="nv">emacs</span><span class="p">)</span>
        <span class="s">"http://www.50ply.com/atom.xml"</span>  <span class="c1">; no autotagging</span>
        <span class="p">(</span><span class="s">"http://nedroid.com/feed/"</span> <span class="nv">webcomic</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="content-tweaking">Content Tweaking</h4>

<p>Going beyond tagging you could change the content of the feed. Say you
want to <a href="http://xkcd.com/1031/">make feeds 100 times better</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">hundred-times-better</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">original</span> <span class="p">(</span><span class="nv">elfeed-deref</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)))</span>
         <span class="p">(</span><span class="nb">replace</span> <span class="p">(</span><span class="nv">replace-regexp-in-string</span> <span class="s">"keyboard"</span> <span class="s">"leopard"</span> <span class="nv">original</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="nb">replace</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span> <span class="nf">#'</span><span class="nv">hundred-times-better</span><span class="p">)</span>
</code></pre></div></div>

<p>The same trick could be used to remove advertising, change the date,
change the title, etc. The <code class="language-plaintext highlighter-rouge">elfeed-deref</code> and <code class="language-plaintext highlighter-rouge">elfeed-ref</code> parts are
needed to fetch and store content in the content database. Only a
reference is stored on the structure. You can actually use these
functions at any time outside of Elfeed, but they’ll eventually get
garbage collected if Elfeed doesn’t know about them.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">ref</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="s">"Hello, World"</span><span class="p">))</span>
<span class="c1">;; =&gt; [cl-struct-elfeed-ref "907d14fb3af2b0d4f18c2d46abe8aedce17367bd"]</span>

<span class="p">(</span><span class="nv">elfeed-deref</span> <span class="nv">ref</span><span class="p">)</span>
<span class="c1">;; =&gt; "Hello, World"</span>
</code></pre></div></div>

<h3 id="deletion">Deletion</h3>

<p>A question that’s been asked few times is if entries can be <em>deleted</em>.
To start off, the answer to that question is “no.” There is no
function provided to remove entries from the database. If you want to
remove entries you’re probably taking the wrong approach.</p>

<p>The main problem with removal is that Elfeed needs to keep track of
what it’s seen before. If an entry is removed and then rediscovered,
it will reappear as unread. There are better ways to “remove” entries,
such as tagging them specially.</p>

<p>On a moderately-powerful computer Elfeed can easily handle <em>at least</em>
several tens of thousands of database entries. If “too many entries”
ever becomes a performance problem I’d rather solve it by making the
database faster than by removing information from the database. It’s
already very date-oriented so that older entries are infrequently
touched.</p>

<p>If storage is a concern, you shouldn’t get too worked up about that.
As of this post I have about 6,000 entries in my database and the
index file is only 3.5 MB. The content database after garbage
collection, which is the <code class="language-plaintext highlighter-rouge">data/</code> directory under <code class="language-plaintext highlighter-rouge">~/.elfeed/</code>, with
these 6k entries is 17MB. When I run <code class="language-plaintext highlighter-rouge">M-x elfeed-db-compact</code>,
currently an experimental feature, it drops down to 1.8MB. That’s less
than 1 kB per entry. It’s also less than my personal Liferea database
of roughly the same amount of content (~15MB) before I wrote Elfeed.</p>

<p>If even this storage is still too much you can always blow away your
<code class="language-plaintext highlighter-rouge">data/</code> content database directory. This is safe to do even while
Emacs is running. You’ll still see all of the entries listed in the
search buffer but won’t be able to read them within Emacs until after
the next database update (when it re-fetches the most recent entry
content).</p>

<p>You can also clear out the content database from within Elisp by
visiting every entry and clearing its content field.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-elfeed-db-visit</span> <span class="p">(</span><span class="nv">entry</span> <span class="nv">_</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="no">nil</span><span class="p">))</span>

<span class="p">(</span><span class="nv">elfeed-db-gc</span><span class="p">)</span>  <span class="c1">;; garbage collect everything</span>
</code></pre></div></div>

<p>The same sort of expression can be used to run over all known entries
to perform other changes. If there was a delete function you might use
it here to remove entries older than a certain date, then hope they’re
not rediscovered.</p>

<p>If you <em>never</em> want to store entry content (you never read entries
within Emacs), you can use a hook to always drop it on the floor as it
arrives,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span> <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="no">nil</span><span class="p">)))</span>
</code></pre></div></div>

<h3 id="questions">Questions?</h3>

<p>If you have any questions or suggestions about how to make Elfeed do
what you want it to do, feel free to ask. Some things may actually
require that I make changes to Elfeed to support it, though I hope
I’ve anticipated your particular need well enough to avoid that.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Atom vs. RSS</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/09/23/"/>
    <id>urn:uuid:a36dba78-5234-3269-bb3c-dc1e939f12b1</id>
    <updated>2013-09-23T06:23:51Z</updated>
    <category term="web"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>From <a href="/blog/2013/09/04/">working on Elfeed</a>, I’ve recently become
fairly intimate with the Atom and RSS specifications. I needed to
write a parser for each that would properly handle valid feeds but
would also reasonably handle all sorts of broken feeds that it would
come across. At this point I’m quite confident in saying that <strong>Atom
is <em>by far</em> the better specification</strong> and I really wish RSS didn’t
exist. This isn’t surprising: Atom was created specifically in
response to RSS’s flawed and ambiguous specification.</p>

<p>One consequence of this realization is that I’ve added an Atom feed to
this blog and made it the the primary feed. Because so many people are
still using the RSS feed, it will continue to be supported even though
there are no longer links to it (Ha, try to find it now!). You may
have noticed that I also started including the full post body in my
feed entries. Now that my feed usage habits have changed, I felt that
truncating content was actually rather rude. There’s still the issue
that it contains relative URLs, but I’m not aware of any way to fix
this with Jekyll. I also got a lot more precise with dates. Until
recently, all posts occurred at midnight PST on the post date.</p>

<p>For reference, here are the specifications. Just these two documents
cover about 99% of the web feeds out there.</p>

<ul>
  <li><a href="http://www.ietf.org/rfc/rfc4287.txt">Atom</a></li>
  <li><a href="http://www.rssboard.org/rss-specification">RSS 2.0</a></li>
</ul>

<p>Not that it matters too much, but it’s unfortunate that RSS has sort
of “won” this format war. Of the feeds that I follow, about 75% are
RSS and 25% are Atom. That’s still a significant number of web feeds
and Atom is well-supported by all the clients that I’m aware of, so
it’s in no danger of falling out of use. The broken (but still valid)
RSS feeds I’m come across probably wouldn’t be broken if they were
originally created as Atom feeds. Atom is a stricter standard and,
therefore, would have guided these authors to create their feeds
correctly from the start. <strong>RSS encourages authors to do the <em>wrong</em>
thing.</strong></p>

<h3 id="the-flaws-of-rss">The Flaws of RSS</h3>

<p>For reference, here’s a typical, friendly RSS 2.0 feed.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="nt">&lt;rss</span> <span class="na">version=</span><span class="s">"2.0"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;channel&gt;</span>
    <span class="nt">&lt;title&gt;</span>Example RSS Feed<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;item&gt;</span>
      <span class="nt">&lt;title&gt;</span>Example Item<span class="nt">&lt;/title&gt;</span>
      <span class="nt">&lt;description&gt;</span>A summary.<span class="nt">&lt;/description&gt;</span>
      <span class="nt">&lt;link&gt;</span>http://www.example.com/foo<span class="nt">&lt;/link&gt;</span>
      <span class="nt">&lt;guid&gt;</span>http://www.example.com/foo<span class="nt">&lt;/guid&gt;</span>
      <span class="nt">&lt;pubDate&gt;</span>Mon, 23 Sep 2013 03:00:05 GMT<span class="nt">&lt;/pubDate&gt;</span>
    <span class="nt">&lt;/item&gt;</span>
  <span class="nt">&lt;/channel&gt;</span>
<span class="nt">&lt;/rss&gt;</span>
</code></pre></div></div>

<h4 id="guid-the-misnomer">guid, the misnomer</h4>

<p>Two of the biggest RSS flaws — flaws that forced me to make a major
design compromise when writing Elfeed — have to do with the <code class="language-plaintext highlighter-rouge">guid</code>
tag. That’s GUID, as in Global Unique Identifier. Not only did it not
appear until RSS 2.0, but <strong>the guid tag is not required</strong>. In
practice an RSS client will be rereading the same feed items over and
over, so it’s critical that it’s able to identify what items it’s seen
before.</p>

<p>Without a guid tag it’s up to the client to guess what items have been
seen already, and there’s no guidance in the specification for doing
so. Without a guid tag, some clients use contents of the <code class="language-plaintext highlighter-rouge">link</code> tag as
an identifier (Elfeed, The Old Reader). In practice it’s very unlikely
for two unique items to have the same link. Other clients track the
entire contents of the item, so when any part changes, such as the
description, it’s treated as a brand new item (Liferea). Some
guid-less feeds regularly change their <code class="language-plaintext highlighter-rouge">description</code> (advertising,
etc.), so they’re not handled well by the latter clients. It’s a mess.</p>

<p>In contrast, Atom’s <code class="language-plaintext highlighter-rouge">id</code> element is required. If someone doesn’t have
one you can send them angry e-mails for having an invalid feed.</p>

<p>The bigger flaw of the guid tag is that, <strong>by default, guid tag
content is not actually a GUID</strong>! This was a huge oversight by the
specification’s authors. By default, the content of the guid tag
<em>must</em> be a permanent URL. Only if the <code class="language-plaintext highlighter-rouge">isPermalink</code> attribute is set
to false can it actually be a GUID (but even that’s unlikely). If two
different feeds contain items that link to content with the same
permalink then that “GUID” is obviously no longer unique. Two unique
items have the same “unique” ID. Doh! Even if the guid tag was
required, I still couldn’t rely on it in Elfeed.</p>

<p>In contrast, Atom’s <code class="language-plaintext highlighter-rouge">id</code> element must contain an Internationalized
Resource Identifier (<a href="http://www.ietf.org/rfc/rfc3987.txt">IRI</a>). This is guaranteed to be unique.</p>

<p>Unlike Atom, <strong>RSS feeds themselves also don’t have identifiers</strong>. Due
to RSS guids never actually being GUIDs, in order to uniquely identify
feed entries in Elfeed I have to use a tuple of the feed URL and
whatever identifier I can gather from the entry itself. It’s a lot
messier than it should be.</p>

<p>In a purely Atom world, the GUID alone would be enough to identify an
entry and the feed URL wouldn’t matter for identification: I wouldn’t
care where the feed came from, just what it’s called. If the same feed
was hosted at two different URLs, a user could list both, the second
appearance acting as a backup mirror, and Elfeed would merge them
effortlessly.</p>

<h4 id="pubdate-the-incorrectly-specified">pubDate, the incorrectly specified</h4>

<p>RSS <strong>didn’t have any sort of date tag until version 2.0!</strong> A standard
specifically oriented around syndication sure took a long time to have
date information. Before 2.0 the workaround was to pull in a date tag
from another XML namespace, such as Dublin Core.</p>

<p>In contrast, Atom has always had <code class="language-plaintext highlighter-rouge">published</code> and <code class="language-plaintext highlighter-rouge">updated</code> tags for
communicating date information.</p>

<p>Finally, in RSS 2.0, dates arrived in the form of the <code class="language-plaintext highlighter-rouge">pubDate</code> tag.
For some reason the name “date” wasn’t good enough so they went with
this ugly camel-case name. Despite all the extra time, they <em>still</em>
screwed this part up. The specification says that <strong>dates must conform
to the outdated <a href="http://www.ietf.org/rfc/rfc0822.txt">RFC 822</a>, then provides examples that
<em>aren’t</em> RFC 822 dates</strong>! Doh! This is because RFC 822 only allows for
2-digit years, so no one should be using it anymore. The RSS authors
unwittingly created yet another date specification — a mash-up
between these two RFCs. In practice everyone just pretends RSS uses
<a href="http://www.ietf.org/rfc/rfc2822.txt">RFC 2822</a>, which superseded RFC 822.</p>

<p>In contrast, Atom consistently uses <a href="http://www.ietf.org/rfc/rfc3339.txt">RFC 3339</a> dates, along
with a couple of additional restrictions. These dates are <em>much</em>
simpler to parse than RFC 2822, which is complex because it attempts
to be backwards compatible with RFC 822.</p>

<h4 id="rss-10-the-problem-child">RSS 1.0, the problem child</h4>

<p>RSS changed <em>a lot</em> between versions. There was the 0.9x series,
several of which were withdrawn. Later on there was version 1.0 (2000)
and 2.0 (2002). The big problem here is that <strong><a href="http://web.resource.org/rss/1.0/spec">RSS 1.0</a> has
very little in common with 0.9x and 2.0</strong>. It’s practically a whole
different format. In order to officially support RSS, a client has to
be able to parse all of these different formats. In fact, in Elfeed I
have an entirely separate parser for RSS 1.0.</p>

<p>What’s so weird about RSS 1.0? If you thought the name “pubDate” was
ugly you might want to skip this part. In practice it’s namespace
hell. For example, look at <a href="http://rss.gmane.org/messages/excerpts/gmane.linux.kernel">this Gmane RSS 1.0 feed</a>. Unlike the
other RSS versions, the top level element is <code class="language-plaintext highlighter-rouge">rdf:RDF</code>. That’s not a
typo.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="nt">&lt;rdf:RDF</span> <span class="na">xmlns=</span><span class="s">"http://purl.org/rss/1.0/"</span>
         <span class="na">xmlns:rdf=</span><span class="s">"http://www.w3.org/1999/02/22-rdf-syntax-ns"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;channel&gt;</span>
    <span class="nt">&lt;title&gt;</span>RSS 1.0 Example<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;items&gt;</span>
      <span class="nt">&lt;rdf:Seq&gt;</span>
        <span class="nt">&lt;rdf:li</span> <span class="na">rdf:resource=</span><span class="s">"http://example.com/foo"</span><span class="nt">/&gt;</span>
      <span class="nt">&lt;/rdf:Seq&gt;</span>
    <span class="nt">&lt;/items&gt;</span>
  <span class="nt">&lt;/channel&gt;</span>
  <span class="nt">&lt;item&gt;</span>
    <span class="nt">&lt;title&gt;</span>Example Item<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;description&gt;</span>A summary.<span class="nt">&lt;/description&gt;</span>
    <span class="nt">&lt;link&gt;</span>http://www.example.com/foo<span class="nt">&lt;/link&gt;</span>
  <span class="nt">&lt;/item&gt;</span>
<span class="nt">&lt;/rdf:RDF&gt;</span>
</code></pre></div></div>

<p>Remember, if you want dates you’ll need to import another namespace.</p>

<p>Notice the completely redundant <code class="language-plaintext highlighter-rouge">items</code> tag. It’s not like you’re
going to download a partial feed and use the <code class="language-plaintext highlighter-rouge">items</code> tag to avoid
grabbing full content. It’s just noise.</p>

<p>Even more important: notice that the <strong>items are <em>outside</em> the
<code class="language-plaintext highlighter-rouge">channel</code> tag</strong>! Why would they completely restructure everything in
1.0? It’s madness. Fortunately everything here was dumped in RSS 2.0
and, except for a very small number of feeds, it’s almost just a bad
memory.</p>

<h4 id="channel-the-vestigial-tag">channel, the vestigial tag</h4>

<p>Notice in the example RSS feed it goes <code class="language-plaintext highlighter-rouge">rss</code> -&gt; <code class="language-plaintext highlighter-rouge">channel</code> -&gt; <code class="language-plaintext highlighter-rouge">item*</code>.
Having a <code class="language-plaintext highlighter-rouge">channel</code> tag suggests a single feed can have a number of
different channels. Nope! Only one channel is allowed, meaning <strong>the
channel tag serves absolutely no purpose</strong>. It’s just more noise. Why
was this ever added?</p>

<p>The good news is that RSS has a <code class="language-plaintext highlighter-rouge">category</code> tag which serves this
purpose much better anyway. Tagging is preferable to hierarchies —
e.g. an item could only belong to one channel but it could belong to
multiple categories.</p>

<h3 id="atom">Atom</h3>

<p>Atom is a much cleaner specification, with much clearer intent, and
without all the mistakes and ambiguities. It’s also more general,
designed for the syndication of many types and shapes of content. This
is what made it popular for use with podcasts. Everything I listed
above I discovered myself while writing Elfeed. There are surely many
other problems with RSS I haven’t noticed yet.</p>

<p>If I only had to support Atom, things would have been significantly
simpler. At the moment I have no complaints about Atom. It’s given me
no trouble.</p>

<p>Someday if you’re going to create a new feed for some content, please
do the web a favor and choose Atom! You’re much more likely to get
things right the first time and you’ll make someone else’s job a lot
easier. As the author of a web feed client you can take my word for
it.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>The Elfeed Database</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/09/09/"/>
    <id>urn:uuid:8aba2e49-22a0-330b-e664-54fb50ecdd00</id>
    <updated>2013-09-09T05:53:41Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>The design of <a href="/blog/2013/09/04/">Elfeed’s</a> database took some experimentation
before any part of it was settled. A major design constraint was
Emacs’ very limited file input/output. There’s no random access and,
without the aid of an external program, files must always be read and
written wholesale. That’s not database-friendly at all! In the end I
settled on a design that minimized the size of the frequently
rewritten parts, an index with two different data models, by storing
immutable data in a loose-file, content-addressable database.</p>

<p>At the moment there really aren’t any pure-Elisp database solutions
for Emacs. This is almost certainly due to the aforementioned I/O
limitations. I ran into this same problem last year when I created
<a href="/blog/2012/12/29/">an Emacs pastebin server</a>. I attempted, and failed, to
interface with a SQLite database through it’s command line program.
Nic Ferrier has published a <a href="https://github.com/nicferrier/emacs-db">generic database interface</a>,
but it lacks concrete implementations.</p>

<p>As a bit of good news, as far as I know Emacs <em>does</em> properly handle
atomic file updates across all platforms, so a pure-Elisp database
developer would never have to worry about only writing half the
database. It’s always a safe operation. Worst case scenario you’re
left with an old version of data rather than no data at all.</p>

<p>A real possibility for a database would be connecting to an
established database server via TCP with an Emacs network process. If
the server has a specified wire protocol Elisp could talk to it
efficiently. In fact, there’s exists <a href="http://www.online-marketwatch.com/pgel/pg.html">pg.el</a> that does <em>exactly</em>
this for PostgreSQL. Unfortunately I was not able to get this working
with my pastebin, nor is this solution appropriate for Elfeed. It
would be unreasonable to require users to first set up a PostgreSQL
server just to read web feeds!</p>

<p>Ultimately it would seem that any efficient Emacs database requires
the help of an external program. The <a href="http://notmuchmail.org/">notmuch</a> mail client,
which inspired Elfeed, does this. To access the notmuch database a
command line program is run once for each request. A query is passed
as a program argument and the output of the program is parsed into the
result.</p>

<h3 id="the-early-database">The Early Database</h3>

<p>For the first few days of its existence Elfeed only had an in-memory
database. Closing Emacs would lose everything. For my personal usage
patterns, where I read, or at least address, all entries that arrive
— and especially because I use Elfeed on a couple of different
computers — I don’t really <em>need</em> to track things long term. I could
easily mark everything after a certain date as read and forget about
them. However, it would be nice to have and, more importantly, many
people wouldn’t use Elfeed without persistence between Emacs sessions.</p>

<p>So, for the first database I did what I always do: dumped the data
structure to a file using the printer and parsed it back in later
using the reader. This is dead simple in Lisp, it’s very fast, and it
even works for circular data structures. It’s something I missed so
much with the much-less-capable JSON format earlier this year that I
<a href="/blog/2013/03/28/">wrote a JavaScript library to do it</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">save-data</span> <span class="p">(</span><span class="nv">file</span> <span class="nv">data</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-file</span> <span class="nv">file</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">print-circle</span> <span class="no">t</span><span class="p">))</span>  <span class="c1">; Allow circular data</span>
      <span class="p">(</span><span class="nb">prin1</span> <span class="nv">data</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">load-data</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="nv">file</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">save-data</span> <span class="s">"demo.dat"</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span> <span class="nv">[</span><span class="s">"1"</span> <span class="mi">2</span> <span class="nv">3]</span><span class="p">))</span>
<span class="p">(</span><span class="nv">load-data</span> <span class="s">"demo.dat"</span><span class="p">)</span>
<span class="c1">;; =&gt; (a b c ["1" 2 3])</span>
</code></pre></div></div>

<p>Anything with a printed representation can be serialized and stored
this way, including symbols, string, numbers, lists, vectors (structs,
objects), hash tables, and even compiled functions (.elc files).
Basically every Emacs library that stores data on disk uses this
technique.</p>

<p>Unfortunately, this is where I hit another serious database
constraint: <a href="http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-08/msg00860.html"><strong><code class="language-plaintext highlighter-rouge">print-circle</code> is broken in Emacs 24.3</strong></a>,
the current stable release. This means Elfeed cannot take advantage of
this useful feature, at least not for a long time, as I had been
counting on. The final database is slightly slower and larger than
strictly required as a result.</p>

<h3 id="the-content-database">The Content Database</h3>

<p>After breaking the circular references of the in-memory database I
finally had persistence for the first time. With the naive
printer/reader approach it was slow, almost 1 second to write just a
few thousand entries on my 6-year-old laptop (my minimum requirements
target machine). I wanted Elfeed to support hundreds of thousands of
entries, if not millions, so this was much too slow.</p>

<p>The big slowdown was writing out all the entry content each time the
database is saved. These large strings containing HTML that rarely
change. There’s no reason to write these out every time, nor is there
a reason to even keep them in memory all the time, as it’s rarely
accessed. The solution is a loose-file, content-addressable database,
very similar to an unpacked Git object database.</p>

<p>The content database stores immutable sequences of characters — not
just raw bytes, but rather multibyte strings — using an unspecified
coding system (right now it’s UTF-8 for all platforms). The filename
for the content is the content hashed with SHA-1
(“content-addressable”). To limit the number of files per directory,
these files are stored in subdirectories named by the first
hex-encoded byte of the hash (just like Git). A database of 4 items
might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>data/
   18/
      18ff6f11945b1e9f3e3c4cae8b5275d36b9944e1
      184c06a83f0bc73a8345c6d886f9043bcae095f8
   6b/
      6b59ae257f2bea24703d8adf5747049c138dfc82
   cc/
      cc47d53872ae2a9186151ef1a68392a94e1f091f
</code></pre></div></div>

<p>Something really neat about the content database is that it’s
completely agnostic about Elfeed. If it weren’t for Elfeed’s garbage
collector, anyone could use it to store arbitrary content. The
function <code class="language-plaintext highlighter-rouge">elfeed-ref</code> accepts a string and returns a reference into
the database. Because of the hash, providing the same string in the
future will return the same reference without actually performing a
write. References are dereferenced with <code class="language-plaintext highlighter-rouge">elfeed-deref</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">ref</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="s">"Hello, world!"</span><span class="p">))</span>
<span class="c1">;; =&gt; [cl-struct-elfeed-ref "943a702d06f34599aee1f8da8ef9f7296031d699"]</span>

<span class="p">(</span><span class="nv">elfeed-deref</span> <span class="nv">ref</span><span class="p">)</span>
<span class="c1">;; =&gt; "Hello, world"</span>
</code></pre></div></div>

<p>With content stored elsewhere, entries are a struct containing only
some small metadata: title, link, date, and a content database
reference. Writing out many of them at once is much, much faster.</p>

<p>I don’t expect it happens often, but this also means content is
de-duplicated. If two entries happen to have the same content they’ll
share content database storage. A small savings.</p>

<p>At this point it’s really tempting to get fancier and really put this
content database to use. The core index itself could be stored as raw
content, and the root to accessing the database would be a single
SHA-1 hash referencing it — again, <em>very</em> similar to Git. If an index
stores a reference to the previously written index, then the the
Elfeed database would be an immutable structure tracking its entire
history. Such a change would cost virtually nothing in performance,
just disk space.</p>

<h3 id="multiple-representations">Multiple Representations</h3>

<p>With all the content out of the way, the database is now just a lean
index. At this point it’s a hash table mapping feed IDs to feeds.
Feeds contain a list of its entries. To build the entry listing for
the elfeed-search buffer, Elfeed needs to visit each feed in the hash
table, gather its entries into one giant list, then finally sort that
list by date. At around O(n log n), that sort operation is a real
performance killer. Completely unacceptable. To fix this we need to
think about how the data is updated and used.</p>

<p>First, <strong>entries are <em>always</em> viewed in date order</strong>, no exceptions.
From my experience of using web feeds for the last six years I <em>never</em>
had a reason to list feed entries by any other order. The vast
majority of the time, newer entries are most relevant, and if I need
to look for something specific I can search for it.</p>

<p>We definitely want to store entries in date-order so we can create
entry listings without performing a sort: something around O(n) or so.
Inserting new entries into this structure should also be efficient.</p>

<p>Second, <strong>entries are never <em>removed</em> from the database</strong>. This isn’t
e-mail. Even if a user doesn’t want to see an entry again, we have to
keep track of it. Otherwise it will show up as new if it’s discovered
in a feed again, which is likely. Things are added to the database and
never removed. In Elfeed, I use a <code class="language-plaintext highlighter-rouge">junk</code> tag to completely hide
entries I don’t want to see, and I always have a <code class="language-plaintext highlighter-rouge">-junk</code> element in my
filter.</p>

<p>There’s an important caveat to this one that I had missed until after
the public release: entry dates can change! When a previously
discovered entry is read from a feed, Elfeed updates (read: mutates)
the entry struct to reflect the new state. This includes the date.
It’s very likely that a date-sorted representation won’t tolerate date
changes underneath it since it’s keying off of them. Either we refuse
to update the entry date, or we remove the entry, update the date, and
then re-insert it (how it currently works).</p>

<p>Third, <strong>entries are generally added with a recent date</strong>. After the
database is initially populated, it’s only picking up new items. We
should prefer adding recently-dated entries be faster than adding
older entries. I didn’t get a chance to take advantage of this, but
it’s something to keep in mind.</p>

<p>Fourth, <strong>entries need to be keyed by an ID string</strong>. Each entry has a
unique, unchanging identifier string, either provided by the feed
itself (RSS’s <code class="language-plaintext highlighter-rouge">guid</code> or Atom’s <code class="language-plaintext highlighter-rouge">id</code>) or generated intelligently by
Elfeed. Especially because of the <code class="language-plaintext highlighter-rouge">print-circle</code> bug, we need to be
able to talk about feeds in terms of their ID — an indirect pointer.</p>

<p>(Actually, even when RSS <code class="language-plaintext highlighter-rouge">guid</code> tags are present, they’re permalinks
by default. So, unfortunately, RSS IDs are not at all resistant to
collisions across feeds. To work around this, entry identifiers are a
<em>pair</em> of strings: feed ID and entry ID. Atom doesn’t have this
problem, but we’re stuck with the lowest common denominator.)</p>

<p>A date-oriented representation would be unable to efficiently look up
an entry by its ID, so it needs to be supplemented by an ID-oriented
representation. This means we need two representations in our
database: date-oriented and ID-oriented.</p>

<p>So what do we use? Well, for keeping entries sorted by date we want
some sort of balanced tree. A B-tree is probably a good choice. Rather
than write one I went with an AVL tree since Emacs comes with a
library for it (<code class="language-plaintext highlighter-rouge">avl-tree</code>). It’s already debugged and optimized! The
bad news is that the internal structure is unspecified, so there are
no guarantees that it can be serialized. A future update to the
library may break the Elfeed database. I also had to hack into it to
work around a security issue. The comparison function is embedded in
the tree. After deserializing the database, Elfeed needs to ensure
that no one stuck a malicious function in there.</p>

<p>The choice for an ID database was super-easy: a hash table. Due to the
<code class="language-plaintext highlighter-rouge">print-circle</code> bug, this is actually the main representation. The AVL
tree only stores IDs and it has to reach into the hash table to do any
date comparisons. If <code class="language-plaintext highlighter-rouge">print-circle</code> was working I could store the same
exact entry objects in the AVL tree as the hash table, so mutating
them would update them in all representations. However, with
<code class="language-plaintext highlighter-rouge">print-circle</code> off, on deserialization these would become unique
objects and updates would break.</p>

<h3 id="the-future">The Future</h3>

<p>That’s where the database is today. I put in a few extra fields that
aren’t actually used yet, so that there’s room to make a few changes
without breaking the database. Perhaps someday I’ll work out a whole
new database structure, or maybe a proper database library will come
into existence, and this post will simply document the old database.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Introducing Elfeed, an Emacs Web Feed Reader</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/09/04/"/>
    <id>urn:uuid:fdfd55d2-65dd-39cc-6695-655c3ea7e8e0</id>
    <updated>2013-09-04T05:33:10Z</updated>
    <category term="emacs"/><category term="web"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>Unsatisfied with my the results of
<a href="/blog/2013/06/13/">recent search for a new web feed reader</a>, I created my own
from scratch, called <a href="https://github.com/skeeto/elfeed">Elfeed</a>. It’s built on top of Emacs and
is available for download through <a href="http://melpa.milkbox.net/">MELPA</a>. I intend it to be
highly extensible, a power user’s web feed reader. It supports both
Atom and RSS.</p>

<ul>
  <li><a href="https://github.com/skeeto/elfeed">https://github.com/skeeto/elfeed</a></li>
</ul>

<p>The design of Elfeed was inspired by <a href="http://notmuchmail.org/">notmuch</a>, which is
<a href="/blog/2013/09/03/">my e-mail client of choice</a>. I’ve enjoyed the notmuch search
interface and the extensibility of the whole system — a side-effect
of being written in Emacs Lisp — so much that I wanted a similar
interface for my web feed reader.</p>

<h3 id="the-search-buffer">The search buffer</h3>

<p>Unlike many other feed readers, Elfeed is oriented around <em>entries</em> —
the Atom term for articles — rather than <em>feeds</em>. It cares less about
where entries came from and more about listing relevant entries for
reading. This listing is the <code class="language-plaintext highlighter-rouge">*elfeed-search*</code> buffer. It looks like
this,</p>

<p><a href="/img/elfeed/search.png"><img src="/img/elfeed/search-thumb.png" alt="" /></a></p>

<p>This buffer is not necessarily about listing unread or recent entries,
it’s a filtered view of all entries in the local Elfeed database.
Hence the “search” buffer. Entries are marked with various <em>tags</em>,
which play a role in view filtering — the notmuch model. By default,
all new entries are tagged <code class="language-plaintext highlighter-rouge">unread</code> (customize with
<code class="language-plaintext highlighter-rouge">elfeed-initial-tags</code>). I’ll cover the filtering syntax shortly.</p>

<p>From the search buffer there are a number of ways to interact with
entries. You can select an single entry with the point, or multiple
entries at once with a region, and interact with them.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">b</code>: visit the selected entries in a browser</li>
  <li><code class="language-plaintext highlighter-rouge">y</code>: copy the selected entry URL to the clipboard</li>
  <li><code class="language-plaintext highlighter-rouge">r</code>: mark selected entries as read</li>
  <li><code class="language-plaintext highlighter-rouge">u</code>: mark selected entries as unread</li>
  <li><code class="language-plaintext highlighter-rouge">+</code>: add a specific tag to selected entries</li>
  <li><code class="language-plaintext highlighter-rouge">-</code>: remove a specific tag from selected entries</li>
  <li><code class="language-plaintext highlighter-rouge">RET</code>: view selected entry in a buffer</li>
</ul>

<p>(This list can be viewed within Emacs with the standard <code class="language-plaintext highlighter-rouge">C-h m</code>.)</p>

<p>The last action uses the Simple HTTP Renderer (shr), now part of
Emacs, to render entry content into a buffer for viewing. It will even
fetch and display images in the buffer, assuming your Emacs has been
built for it. (Note: the GNU-provided Windows build of Emacs doesn’t
ship with the necessary libraries.) It looks a lot like reading an
e-mail within Emacs,</p>

<p><a href="/img/elfeed/show.png"><img src="/img/elfeed/show-thumb.png" alt="" /></a></p>

<p>The standard read-only keys are in action. Space and backspace are for
page up/down. The <code class="language-plaintext highlighter-rouge">n</code> and <code class="language-plaintext highlighter-rouge">p</code> keys switch between the next and
previous entries from the search buffer. The idea is that you should
be able to hop into the first entry and work your way along reading
them within Emacs when possible.</p>

<h3 id="configuration">Configuration</h3>

<p>Elfeed maintains a database in <code class="language-plaintext highlighter-rouge">~/.elfeed/</code> (configurable). It will
start out empty because you need to tell it what feeds you’d like to
follow. List your feeds <code class="language-plaintext highlighter-rouge">elfeed-feeds</code> variable. You would do this in
your <code class="language-plaintext highlighter-rouge">.emacs</code> or other initialization files.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">setq</span> <span class="nv">elfeed-feeds</span>
      <span class="o">'</span><span class="p">(</span><span class="s">"http://www.50ply.com/atom.xml"</span>
        <span class="s">"http://possiblywrong.wordpress.com/feed/"</span>
        <span class="c1">;; ...</span>
        <span class="s">"http://www.devrand.org/feeds/posts/default"</span><span class="p">))</span>
</code></pre></div></div>

<p>Once set, hitting <code class="language-plaintext highlighter-rouge">G</code> (capitalized) in the search buffer or running
<code class="language-plaintext highlighter-rouge">elfeed-update</code> will tell Elfeed to fetch each of these feeds and load
in their entries. Entries will populate the search buffer as they are
discovered (assuming they pass the current filter), where they can be
immediately acted upon. Pressing <code class="language-plaintext highlighter-rouge">g</code> (lower case) refreshes the search
buffer view without fetching any feeds.</p>

<p>Everything fetched will be added to the database for next time you run
Emacs. It’s not required at all in order to use Elfeed, but I’ll
discuss some of
<a href="/blog/2013/09/09/">the details of the database format in another post</a>.</p>

<h3 id="the-search-filter">The search filter</h3>

<p>Pressing <code class="language-plaintext highlighter-rouge">s</code> in the search buffer will allow you to edit the search
filter in action.</p>

<p>There are three kinds of ways to filter on entries, in order of
efficiency: by age, by tag, and by regular expression. For an entry to
be shown, it must pass each of the space-delimited components of the
filter.</p>

<p>Ages are described by plain language relative time, starting with <code class="language-plaintext highlighter-rouge">@</code>.
This component is ultimately parsed by Emacs’ <code class="language-plaintext highlighter-rouge">time-duration</code>
function. Here are some examples.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">@1-year-old</code></li>
  <li><code class="language-plaintext highlighter-rouge">@5-days-ago</code></li>
  <li><code class="language-plaintext highlighter-rouge">@2-weeks</code></li>
</ul>

<p>Tag filters start with <code class="language-plaintext highlighter-rouge">+</code> and <code class="language-plaintext highlighter-rouge">-</code>. When <code class="language-plaintext highlighter-rouge">+</code>, entries <em>must</em> be tagged
with that tag. When <code class="language-plaintext highlighter-rouge">-</code>, entries <em>must not</em> be tagged with that tag.
Some examples,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">+unread</code>: show only unread posts.</li>
  <li><code class="language-plaintext highlighter-rouge">-junk +unread</code>: don’t show unread “junk” entries.</li>
</ul>

<p>Anything else is treated like a regular expression. However, the
regular expression is applied <em>only</em> to titles and URLs for both
entries and feeds. It’s not currently possible to filter on entry
content, and I’ve found that I never want to do this anyway.</p>

<p>Putting it all together, here are some examples.</p>

<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">linu[xs] @1-year-old</code>: only show entries about Linux or Linus from
the last year.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">-unread +youtube</code>: only show previously-read entries tagged
with <code class="language-plaintext highlighter-rouge">youtube</code>.</p>
  </li>
</ul>

<p>Note: the database is date-oriented, so age filtering is by far the
fastest. Including an age limit will greatly increase the performance
of the search buffer, so I recommend adding it to the default filter
(<code class="language-plaintext highlighter-rouge">elfeed-search-search-filter</code>).</p>

<h3 id="tagging">Tagging</h3>

<p>Generally you don’t want to spend time tagging entries. Fortunately
this step can easily be automated using <code class="language-plaintext highlighter-rouge">elfeed-make-tagger</code>. To tag
all YouTube entries with <code class="language-plaintext highlighter-rouge">youtube</code> and <code class="language-plaintext highlighter-rouge">video</code>,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"youtube\\.com"</span>
                              <span class="ss">:add</span> <span class="o">'</span><span class="p">(</span><span class="nv">video</span> <span class="nv">youtube</span><span class="p">)))</span>
</code></pre></div></div>

<p>Any functions added to <code class="language-plaintext highlighter-rouge">elfeed-new-entry-hook</code> are called with the new
entry as its argument. The <code class="language-plaintext highlighter-rouge">elfeed-make-tagger</code> function returns a
function that applies tags to entries matching specific criteria.</p>

<p>This tagger tags old entries as read. It’s handy for initializing an
Elfeed database on a new computer, since I’ve likely already read most
of the entries being discovered.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:before</span> <span class="s">"2 weeks ago"</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="creating-custom-subfeeds">Creating custom subfeeds</h3>

<p>Tagging is also really handy for fixing some kinds of broken feeds or
otherwise filtering out unwanted content. I like to use a <code class="language-plaintext highlighter-rouge">junk</code> tag
to indicate uninteresting entries.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"example\\.com"</span>
                              <span class="ss">:entry-title</span> <span class="o">'</span><span class="p">(</span><span class="nb">not</span> <span class="s">"something interesting"</span><span class="p">)</span>
                              <span class="ss">:add</span> <span class="ss">'junk</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>
</code></pre></div></div>

<p>There are a few feeds I’d <em>like</em> to follow but do not because the
entries lack dates. This makes them difficult to follow without a
shared, persistent database. I’ve contacted the authors of these feeds
to try to get them fixed but have not gotten any responses. I haven’t
quite figured out how to do it yet, but I will eventually create a
function for <code class="language-plaintext highlighter-rouge">elfeed-new-entry-hook</code> that adds reasonable dates to
these feeds.</p>

<h3 id="custom-actions">Custom actions</h3>

<p>In <a href="https://github.com/skeeto/.emacs.d">my own .emacs.d configuration</a> I’ve added a new entry action
to Elfeed: video downloads with youtube-dl. When I hit <code class="language-plaintext highlighter-rouge">d</code> on a
YouTube entry either in the entry “show” buffer or the search buffer,
Elfeed will download that video into my local drive. I consume quite a
few YouTube videos on a regular basis (I’m a “cord-never”), so this
has already saved me a lot of time.</p>

<p>Adding custom actions like this to Elfeed is exactly the extensibility
I’m interested in supporting. I want this to be easy. After just a
week of usage I’ve already customized Elfeed a lot for myself — very
specific customizations which are not included with Elfeed.</p>

<h3 id="web-interface">Web interface</h3>

<p>Elfeed also includes a web interface! If you’ve loaded/installed
<code class="language-plaintext highlighter-rouge">elfeed-web</code>, start it with <code class="language-plaintext highlighter-rouge">elfeed-web-start</code> and visit this URL in
your browser (check your <code class="language-plaintext highlighter-rouge">httpd-port</code>).</p>

<ul>
  <li>http://localhost:8080/elfeed/</li>
</ul>

<p><a href="/img/elfeed/web.png"><img src="/img/elfeed/web-thumb.png" alt="" /></a></p>

<p>Elfeed exposes a RESTful JSON API, consumable by any application. The
web interface builds on this using AngularJS, behaving as a
single-page application. It includes a filter search box that filters
out entries as you type. I think it’s pretty slick, though still a bit
rough.</p>

<p>It still needs some work to truly be useful. I’m intending for this to
become the “mobile” interface to Elfeed, for remote access on a phone
or tablet. Patches welcome.</p>

<h3 id="try-it-out">Try it out</h3>

<p>After Google Reader closed I tried The Old Reader for awhile. When
that collapsed under its own popularity I decided to go with a local
client reader. Canto was crushed under the weight of all my feeds, so
I ended up using Liferea for awhile. Frustrated at Liferea’s lack of
extensibility and text-file configuration, I ended up writing Elfeed.</p>

<p>Elfeed now serving 100% of my personal web feed reader needs. I think
it’s already far better than any reader I’ve used before. Another case
of “I should have done this years ago,” though I think I lacked the
expertise to pull it off well until fairly recently.</p>

<p>At the moment I believe Elfeed is already the most extensible and
powerful web feed reader in the world.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
