Three Dimensions of Type Systems

I occasionally come across articles, and even some books, that get terminology mixed up when discussing type systems. The author might say “strong” when what they’re talking about is “lexical.” In this article I’ll define three orthogonal properties of type systems. A new programming language design could select from each category independently.

Static vs Dynamic

Static versus dynamic typing is probably the most obvious type system property when glancing at an example of a language’s source code. This refers to whether or not variables have types.

In a statically typed language, variables have a compile-time determined type. At run-time, a variable will only ever hold a value of this type. Except where type information can be inferred, variable and function declarations are generally accompanied by its type (manifest). Type violations are generally discovered early, at compile-time, at the cost of additional up-front planning.

In a dynamically typed language, only values have types. A variable may be bound to any type of value. Though a smart compiler may reason about a program enough to know that certain variables are only ever bound to a limited set of types. Type violations are generally discovered late, at run-time, but this allows for more ad hoc design.

Statically typed languages: C, C++, Java, C#, Haskell

Dynamically typed languages: Python, Ruby, JavaScript, Lisp, Clojure

To give a quick comparison, here’s the same function definition in C and JavaScript.

double distance(struct point a, struct point b) {
    return sqrt(pow(a.x - b.x, 2) + pow(a.y - b-y, 2));
}
function distance(a, b) {
    return Math.sqrt(Math.pow(a.x - b.x, 2) + Math.pow(a.y - b.y, 2));
}

Dynamic type systems are more apt for duck typing, though there are exceptions such as with C++ templates. In the JavaScript example above, anything that has numeric x and y properties can be passed to distance. The actual type doesn’t matter. In the C version, only that very specific type, struct point, may be passed to distance.

The C example could be made more generic, circumventing its type system through a trick called type punning. This is where a value is accessed by the program as though it was a different type of value. This requires up-front planning and may potentially violate strict aliasing.

Lexical vs. Dynamic Scope

Lexical versus dynamic scope refers not to any values or objects themselves, but rather to how variables are accessed. Virtually all popular programming languages in use today use lexical scope. This is because dynamic scope has serious performance and correctness problems. In fact, it was likely invented entirely by accident.

However, it’s still useful to use dynamic scope on a careful, opt-in process. Perl, Common Lisp, Clojure, and Emacs Lisp all permit selective dynamic scope. It’s a clean method for temporarily masking a global variable, such as the default stream for reading/writing input/output.

Under lexical scope, the scope of a variable is determined statically at compile-time. The compiler knows about all accesses to a particular variable. This is sometimes called static scope but I’m using the word lexical here to help differentiate from static typing (above).

In dynamic scope, all variables are essentially global variables. A new binding masks any existing global binding for any functions called from within that binding’s extent. If a function accesses a free variable, it’s not known until run-time from where that value may come. When the last binding is removed, such as when a local variable’s scope exited, that global variable is then said to be unbound. Dynamic scope is incompatible with closures.

Lexically scoped languages: C, C++, Java, JavaScript, Python, Ruby, (many more)

Dynamically scoped languages: Emacs Lisp, bash

As of Emacs 24, lexical scope can be enabled by default for a file/buffer by setting lexical-binding to true. I imagine this will some day become the default, making Emacs Lisp a lexically scoped language. This is also a perfect example of lexical scope having better performance: turning on lexical scope makes Elisp programs run faster.

Here’s an example of dynamic scope in Emacs Lisp.

(defun reveal ()
  x) ; free variable

(defun foo ()
  (let ((x :foo))
    (reveal)))

(defun bar ()
  (let ((x :bar))
    (reveal)))

(foo)
;; => :foo

(bar)
;; => :bar

The value of x as seen by reveal depends on which function called it, since the binding leaks through. Running the exact same code in Common Lisp, where it’s lexically scoped, would result in a run-time error. It always tries to access x as a global variable. The scope of x is strictly limited to foo or bar.

Strong vs. Weak

Strong versus weak is probably the least understood property. Strong typing is often mixed up with static typing despite being an orthogonal concept. A language can be strongly, dynamically typed (Python) — or weakly, statically typed (C). Strong/weak is also sometimes confused with type safety.

This aspect refers to the tendency of values to be implicitly coerced into another type depending on how they are used. Unlike the previous two type system properties, this one isn’t bimodal. There’s a degree to just how much implicit coercion occurs.

Strongly typed languages: Python, Common Lisp, Java, Ruby

Weakly typed languages: JavaScript, PHP, Perl, C

For example, take the following expression.

In strongly typed languages this will generally be an error: strings have no definition with the subtraction operator. In weakly typed languages, the “8” is likely to be parsed into a number as part of being handled by the operator, with the expression evaluating to 3.

If a language has a triple equality operator (===), that’s a dead giveaway that it’s weakly typed.

In the case of C, its pointers are what make it weakly typed. It’s easy to make a pointer to a value, then dereference it as a different type (usually leading to undefined behavior).

This is another trade-off between safety and convenience. Modern languages tend towards strong typing.

Further Reading

Here are three more type system properties that weren’t discussed.

Now a language can be summed up in six tokens (using the order I presented them here). Unfortunately, applying the terms to languages is somewhat subjective, especially when languages fall somewhere in between, so not everyone will come to the same conclusions as I do here.

Update May 2014: Exactly one day day after I posted this article, Robert Smallshire made a video covering the same topic in the same way: The Unreasonable Effectiveness of Dynamic Typing for Practical Programs.

Have a comment on this article? Start a discussion in my public inbox by sending an email to ~skeeto/public-inbox@lists.sr.ht [mailing list etiquette] , or see existing discussions.

null program

Chris Wellons

wellons@nullprogram.com (PGP)
~skeeto/public-inbox@lists.sr.ht (view)