A more robust raw OpenBSD syscall demo

Ted Unangst published dude, where are your syscalls? on flak yesterday, with a neat demonstration of OpenBSD’s pinsyscall security feature, whereby only pre-registered addresses are allowed to make system calls. Whether it strengthens or weakens security is up for debate, but regardless it’s an interesting, low-level programming challenge. The original demo is fragile for multiple reasons, and requires manually locating and entering addresses for each build. In this article I show how to fix it. To prove that it’s robust, I ported an entire, real application to use raw system calls on OpenBSD.

[]

Robust Wavefront OBJ model parsing in C

Wavefront OBJ is a line-oriented, text format for 3D geometry. It’s widely supported by modeling software, easy to parse, and trivial to emit, much like Netpbm for 2D image data. Poke around hobby 3D graphics projects and you’re likely to find a bespoke OBJ parser. While typically only loading their own model data, so robustness doesn’t much matter, they usually have hard limitations and don’t stand up to fuzz testing. This article presents a robust, partial OBJ parser in C with no hard-coded limitations, written from scratch. Like similar articles, it’s not really about OBJ but demonstrating some techniques you’ve probably never seen before.

[]

Meet the new xxd for w64devkit: rexxd

xxd is a versatile hexdump utility with a “reverse” feature, originally written between 1990–1996. The Vim project soon adopted it, and it’s lived there ever since. If you have Vim, you also have xxd. Its primary use cases are (1) the basis for a hex editor due to its -r reverse option that can unhexdump its previous output, and (2) a data embedding tool for C and C++ (-i). The former provides Vim’s rudimentary hex editor functionality. The second case is of special interest to w64devkit: xxd -i appears in many builds that embed arbitrary data. It’s important that w64devkit has a compatible implementation, and a freshly rewritten, improved xxd, rexxd, now replaces the original xxd (as xxd).

[]

Tips for more effective fuzz testing with AFL++

Fuzz testing is incredibly effective for mechanically discovering software defects, yet remains underused and neglected. Pick any program that must gracefully accept complex input, written in any language, which has not yet been been fuzzed, and fuzz testing usually reveals at least one bug. At least one program currently installed on your own computer certainly qualifies. Perhaps even most of them. Everything is broken and low-hanging fruit is everywhere. After fuzz testing ~1,000 projects over the past six years, I’ve accumulated tips for picking that fruit. The checklist format has worked well in the past (1, 2), so I’ll use it again. This article discusses AFL++ on source-available C and C++ targets, running on glibc-based Linux distributions, currently the indisputable best fuzzing platform for C and C++.

[]

Examples of quick hash tables and dynamic arrays in C

This article durably captures my reddit comment showing techniques for std::unordered_map and std::vector equivalents in C programs. The core, important features of these data structures require only a dozen or so lines of code apiece. They compile quickly, and tend to run faster in debug builds than release builds of their C++ equivalents. What they lack in genericity they compensate in simplicity. Nothing here will be new. Everything has been covered in greater detail previously, which I will reference when appropriate.

[]

Rules to avoid common extended inline assembly mistakes

GCC and Clang inline assembly is an interface between high and low level programming languages. It is subtle and treacherous. Many are ensnared in its traps, usually unknowingly. As such, the asm keyword is essentially the unsafe keyword of C and C++. Nearly every inline assembly tutorial, including the awful ibilio page at the top of search engines for decades, propagate fundamental, serious mistakes, and most examples are incorrect. The dangerous part is that the examples usually produce the expected results! The situation is dire. This article isn’t a tutorial, but basic rules to avoid the most common mistakes, or to spot them in code review.

[]

Everything I've learned so far about running local LLMs

This article was discussed on Hacker News.

Over the past month I’ve been exploring the rapidly evolving world of Large Language Models (LLM). It’s now accessible enough to run a LLM on a Raspberry Pi smarter than the original ChatGPT (November 2022). A modest desktop or laptop supports even smarter AI. It’s also private, offline, unlimited, and registration-free. The technology is improving at breakneck speed, and information is outdated in a matter of months. This article snapshots my practical, hands-on knowledge and experiences — information I wish I had when starting. Keep in mind that I’m a LLM layman, I have no novel insights to share, and it’s likely I’ve misunderstood certain aspects. In a year this article will mostly be a historical footnote, which is simultaneously exciting and scary.

[]

Windows dynamic linking depends on the active code page

Windows paths have been WTF-16-encoded for decades, but module names in the import tables of Portable Executable are octets. If a name contains values beyond ASCII — technically out of spec — then the dynamic linker must somehow decode those octets into Unicode in order to construct a lookup path. There are multiple ways this could be done, and the most obvious is the process’s active code page (ACP), which is exactly what happens. As a consequence, the specific DLL loaded by the linker may depend on the system code page. In this article I’ll contrive such a situation.

[]

Slim Reader/Writer Locks are neato

I’m 18 years late, but Slim Reader/Writer Locks have a fantastic interface: pointer-sized (“slim”), zero-initialized, and non-allocating. Lacking cleanup, they compose naturally with arena allocation. Sounds like a futex? That’s because they’re built on futexes introduced at the same time. They’re also complemented by condition variables with the same desirable properties. My only quibble is that slim locks could easily have been 32-bit objects, but it hardly matters. This article, while treating Win32 as a foreign interface, discusses a paper-thin C++ wrapper interface around lock and condition variables, in my own style.

[]

Giving C++ std::regex a C makeover

Suppose you’re working in C using one of the major toolchains — that is, it’s mainly a C++ implementation — and you need regular expressions. You could integrate a library, but there’s a regex implementation in the C++ standard library included with your compiler, just within reach. As a resourceful engineer, using an asset already in hand seems prudent. But it’s a C++ interface, and you’re using C instead of C++ for a reason, perhaps to avoid dealing with C++. Have no worries. This article is about wrapping std::regex in a tidy C interface which not only hides all the C++ machinery, but utterly tames it. It’s not so much practical as a potpourri of interesting techniques.

[]

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)