NASM x86 Assembly Major Mode for Emacs

Last weekend I created a new Emacs mode, nasm-mode, for editing Netwide Assembler (NASM) x86 assembly programs. Over the past week I tweaked it until it felt comfortable enough to share on MELPA. It’s got what you’d expect from a standard Emacs programming language mode: syntax highlighting, automatic indentation, and imenu support. It’s not a full parser, but it knows all of NASM’s instructions and directives.

Until recently I didn’t really have preferences about x86 assemblers (GAS, NASM, YASM, FASM, MASM, etc.) or syntax (Intel, AT&T). I stuck to the GNU Assembler (GAS) since it’s already there with all the other GNU development tools I know and love, and it’s required for inline assembly in GCC. However, nasm-mode now marks my commitment to NASM as my primary x86 assembler.

Why NASM?

I need an assembler that can assemble 16-bit code (8086, 8088, 80186, 80286), because real mode is fun. Despite its .code16gcc directive, GAS is not suitable for this purpose. It’s just enough to get the CPU into protected mode — as needed when writing an operating system with GCC — and that’s it. A different assembler is required for serious 16-bit programming.

GAS syntax has problems. I’m not talking about the argument order (source first or destination first), since there’s no right answer to that one. The linked article covers a number of problems, with these being the big ones for me:

Being a portable assembler, GAS is the jack of all instruction sets, master of none. If I’m going to write a lot of x86 assembly, I want a tool specialized for the job.

YASM

I also looked at YASM, a rewrite of NASM. It supports 16-bit assembly and mostly uses NASM syntax. In my research I found that NASM used to lag behind in features due to slower development, which is what spawned YASM. In recent years this seems to have flipped around, with YASM lagging behind. If you’re using YASM, nasm-mode should work pretty well for you, since it’s still very similar.

YASM optionally supports GAS syntax, but this reintroduces almost all of GAS’s problems. Even YASM’s improvements (i.e. its ORG directive) become broken when switching to GAS syntax.

FASM

FASM is the “flat assembler,” an assembler written in assembly language. This means it’s only available on x86 platforms. While I don’t really plan on developing x86 assembly on a Raspberry Pi, I’d rather not limit my options! I already regard 16-bit DOS programming as a form of embedded programming, and this may very well extend to the rest of x86 someday.

Also, it hasn’t made its way into the various Linux distribution package repositories, including Debian, so it’s already at a disadvantage for me.

MASM

This is Microsoft’s assembler that comes with Visual Studio. Windows only and not open source, this is in no way a serious consideration. But since NASM’s syntax was originally derived from MASM, it’s worth mentioning. NASM takes the good parts of MASM and fixes the mistakes (such as the offset operator). It’s different enough that nasm-mode would not work well with MASM.

NASM

It’s not perfect, but it’s got an excellent manual, it’s a solid program that does exactly what it says it will do, has a powerful macro system, great 16-bit support, highly portable, easy to build, and its semantics and syntax has been carefully considered. It also comes with a simple, pure binary disassembler (ndisasm). In retrospect it seems like an obvious choice!

My one complaint would be that it’s that it’s too flexible about labels. The colon on labels is optional, which can lead to subtle bugs. NASM will warn about this under some conditions (orphan-labels). Combined with the preprocessor, the difference between a macro and a label is ambiguous, short of re-implementing the entire preprocessor in Emacs Lisp.

Why nasm-mode?

Emacs comes with an asm-mode for editing assembly code for various architectures. Unfortunately it’s another jack-of-all-trades that’s not very good. More so, it doesn’t follow Emacs’ normal editing conventions, having unusual automatic indentation and self-insertion behaviors. It’s what prompted me to make nasm-mode.

To be fair, I don’t think it’s possible to write a major mode that covers many different instruction set architectures. Each architecture has its own quirks and oddities that essentially makes gives it a unique language. This is especially true with x86, which, from its 37 year tenure touched by so many different vendors, comes in a number of incompatible flavors. Each assembler/architecture pair needs its own major mode. I hope I just wrote NASM’s.

One area where I’m still stuck is that I can’t find an x86 style guide. It’s easy to find half a dozen style guides of varying authority for any programming language that’s more than 10 years old … except x86. There’s no obvious answer when it comes to automatic indentation. How are comments formatted and indented? How are instructions aligned? Should labels be on the same line as the instruction? Should labels require a colon? (I’ve decided this is “yes.”) What about long label names? How are function prototypes/signatures documented? (The mode could take advantage of such a standard, a la ElDoc.) It seems everyone uses their own style. This is another conundrum for a generic asm-mode.

There are a couple of other nasm-modes floating around with different levels of completeness. Mine should supersede these, and will be much easier to maintain into the future as NASM evolves.

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)