Where do closures come from?

Common Lisp's function form is usually described as a device for switching between namespaces: it evaluates its argument in the “function” namespace instead of the normal “variable” namespace.

Older sources have a completely different idea: they say function makes closures. The Hyperspec says:

If name is a lambda expression, then a lexical closure is returned.


function creates a closure of the lambda expression

Both of these lines were inherited from CLtL, so this is not a new interpretation, nor one incompatible with the best of knowledge. What's going on?

To begin with, these two interpretations of function aren't observably different in portable Common Lisp. The only portable way to get a closure is by (function (lambda ...)) or by macros like defun that might expand to it. ((lambda ...) expands to (function (lambda ...)), because unlike all other special forms, lambda is in the function namespace, but that's just a historical quirk.) The only way to use lambda without function is ((lambda ...) ...), which has the same semantics regardless of whether it makes a closure. So portable code can't tell the difference.

Implementation-specific extensions can. If compile is extended to non-null lexical environments, it will make closures out of lambda-expressions without any help from function. Or if there's a named-lambda form that makes closures, it's unnecessarily complex to attribute the closure in (function (lambda ...)) to function.

So Common Lisp culture favors the simpler interpretation: lambda makes closures, and function is a mere namespacing operator.

Like so many oddities of CL, the old interpretation comes from Lisp Machine Lisp. The 1984 Lisp Machine Manual introduces function by saying it “has two distinct, though related, meanings.” The first is to get a symbol's function definition, and the second is to make a closure:

(let (a)
  (mapcar (function (lambda (x) (push x a))) l))
passes mapcar a specially designed closure made from the function represented by (lambda (x) (push x a)). When mapcar calls this closure, the lexical environment of the function form is put again into effect, and the a in (push x a) refers properly to the binding made by this let.

These two meanings were reflected in implementations. Guy Steele's reference interpreter (in the CL mailing list archive) doesn't bother to make a closure for ((lambda ...) ...), only for (function (lambda ...)). But when optimizing compilers became the norm, it no longer seemed silly (or inefficient) for lambda to always make a closure, so reinterpreting function as a namespacing operator made sense.

Surprisingly, this is not the first time function has been reinterpreted. The Pitmanual says Maclisp's function didn't make closures — it took a different form, *function, to even partially do that. function was equivalent to quote, except that in compiled code it would make a compiled function instead of just a lambda-expression — it permitted compilation but didn't change scoping. When Lisp Machine Lisp changed it to make closures, that was largely backward compatible, since most lambdas were intended to use lexical scope anyway. (I'm not sure when compilers started to use lexical scope — was that in Maclisp?)

I don't think any other language construct has had so many unrelated meanings over the years, let alone done so while preserving the meaning of existing code. function was originally a hint to the compiler, then a way to make closures, and then a namespacing operator. Its history probably ends there, since most new lisps eschew multiple namespaces and omit function rather than repurpose it, but three unrelated meanings is impressive.

Trivial program checkers

Typecheckers get (and deserve) a lot of attention for their ability to find bugs, but their success leads people to think typechecking is the only way to check programs. It's not. There are useful program checkers much simpler than any typechecker. Here's an example:

grep scanf

This finds real bugs in real programs — and not just ordinary bugs, but security holes due to %s overflowing buffers.

Here's another checker:

grep 'printf[^"]*$'

This finds printfs that don't have a literal string on the same line, which usually means someone forgot the format string and did this:

fprintf(file, somestr);

...instead of this:

fprintf(file, "%s", somestr);

It's a stupid bug, yes, but not a rare one. I once ran this checker on a large application and found dozens of instances of this bug. I also found dozens of false positives, from things like these:

snprintf(somewhere->buffer, MAX_BUFFER,
         "format string", args);
fprintf(file, message_format_strings[status], description);

But they were obvious false positives, so it was easy to ignore them.

Here's an even less selective checker:

grep '(\w\+ \?\*)'  #beware different versions of grep

This finds pointer typecasts, which (in C++, more than in C) are often misguided — they might indicate unsafe downcasts, or non-type-safe containers, or casting away constness, or simple unnecessary casting. It also finds a great many false positives, of course — mostly function prototypes and innocent casts.

These checkers don't prove the absence of the errors they look for. A program that doesn't contain the string scanf might still call it via a library or by dlsym. The printf checker can be defeated by something as simple as a printf-like function whose name doesn't contain printf — hardly a rare occurrence! The cast checker misses mundane things like (char**) and (IntPtr). They only find bugs; they don't guarantee their absence.

They're also not very powerful. They find only certain specific errors, not a wide variety. A real lint program can do much better.

But when you don't have a real lint handy, or when your lint doesn't find the problem you're worried about, simple textual checkers can be valuable.

“They only find bugs”. “Only certain specific errors”. Faint criticism.

In addition to being useful, these checkers are a reminder that there are many ways to check programs. None of them are typecheckers in either sense — not in the common sense, because they don't check datatypes, and not in the type-theory sense, because they don't classify expressions. They aren't even aware of the existence of expressions — they see code only as text. This is not a very powerful approach, but it's enough to find a lot of bugs.

Not all checkers are typecheckers.

Atomic file replacement and unpredictable primitives

Many programs need to update files atomically, so they don't corrupt them if they crash while writing. The usual primitive for this is an atomic replacement operation like POSIX rename, which allows programs to implement atomic updates by writing to a temporary file and then replacing the real file with it. Typical use is as in this C macro:

#define ATOMIC_WRITE(filevar, path, mode, body)         \
  do {                                                  \
    const char *realpath = path;                        \
    char temppath[PATH_MAX];                            \
    if (snprintf(temppath, PATH_MAX, "%s.temp", realpath) >= PATH_MAX) \
      die("path too long: %s", realpath);               \
    FILE *filevar = fopen(temppath, mode);              \
    if (!filevar)                                       \
      die("unable to write file: %s", temppath);        \
    body                                                \
      fclose(filevar);                                  \
    if (rename(temppath, realpath)) {                   \
      remove(temppath);                                 \
      die("unable to replace file: %s", realpath);      \
    }                                                   \
  } while (0)

...but it's not usually written as a macro, because of a common problem of C: there's no good way for the macro to communicate errors to its caller, or to clean up when the caller has an error. It can be written as three functions — one to generate the temporary name and open the file, and two for successful and unsuccessful close, but this is complex enough that we seldom think of it. Instead we just write the same code over and over with different error handling, and different bugs, each time.

This makes it a good candidate for standard libraries, at least in languages that don't suffer C's error-handling deficiencies. It could be conveniently provided as an open mode (or a separate operation, if your language don't have modes) that writes to a temporary and atomically replaces the file when it's closed.

Common Lisp's :if-exists :supersede option to open sounds like it does this...

The existing file is superseded; that is, a new file with the same name as the old one is created. If possible, the implementation should not destroy the old file until the new stream is closed.

...but the replace-on-close behavior is optional, and not necessarily atomic. :supersede is also the only portable way to request that the file be truncated when opened, so AFAIK no implementation actually gives it a meaning beyond that.

Why is this so hard in Common Lisp?

I initially gave the example in Common Lisp instead of C, so it could handle errors properly. That part is easy, but it's much more complicated for other reasons:

(defun make-temp-pathname (path)
  "Append .temp to the name of a file, before the extension (if any).
Unlike /temp, this keeps it on the same filesystem, so renames will be cheap."
  ;;Simply appending .temp to the namestring doesn't work, because
  ;;operations like rename-file “helpfully” misinterpret it as a file
  ;;type and use it for defaulting, so e.g. (rename-file "a.temp" "b")
  ;;renames a.temp to b.temp.
  (make-pathname :name (format nil "~A.temp" (pathname-name path))
                 :defaults path))

(defmacro with-atomic-output-file ((streamvar pathname) &body body)
  "Execute BODY with STREAMVAR bound to an output stream, like WITH-OPEN-FILE,
but update the file atomically, and only if BODY returns normally."
  (alexandria:with-gensyms (ok? tempfile realfile)
    `(let* ((,ok? nil)
            (,realfile ,pathname)
            (,tempfile (make-temp-pathname ,realfile)))
        (with-open-file (,streamvar ,tempfile :direction :output :if-exists :supersede)
          (setf ,ok? t))
        (if ,ok?
          (rename-file ,tempfile ,realfile #+clisp :if-exists #+clisp :overwrite)
          #-sbcl (delete-file ,tempfile)))))) ;SBCL deletes it automatically and complains that it doesn't exist

It also isn't portable, because Common Lisp doesn't specify that rename-file will replace an existing file. SBCL does, but Clisp doesn't (even on Unix, surprisingly — it goes out of its way to break this) unless it's reassured with :if-exists :overwrite. Also, with-open-file might automatically delete the temporary on abnormal exit, and delete-file might complain if it doesn't exist. These unreliable semantics, together with the perverse conveniences of pathnames, make it harder to write atomic replace portably in CL than in C.

So when you provide access to system primitives like rename, don't change their semantics. Users will not be surprised by the system's native behaviour, and sometimes they need it.

Why concatenative programming matters

Jon Purdy's account of why concatenative programming matters focuses on static types, which is an odd choice; it seems to me irrelevant to why these languages are interesting. (I suspect it's just mistitled; it could more accurately be called “Some interesting things about concatenative languages”.) If stack-based (“concatenative”) languages are interesting, it's not because they're especially amenable to static analysis or because their data flow incarnates a certain type system, but because of the expressive possibilities they demonstrate. In particular:

  1. Points-free style matters, because it makes code shorter. Many variables have uninformative names like x, and it loses nothing to leave them out. Even those with informative names are usually repeated more often than justified by their value as comments.
  2. ...but writing only in points-free style is a pain (even for Chuck Moore). So binding variables shouldn't be considered shameful, as it often is in Forth culture.
  3. ...but having lots of combinators available makes it much easier. Factor is less puzzle-like than Forth, partly because it has lambda (in the form of quotations) and plenty of combinators.
  4. Stackwise concatenation is not the only reasonable default composition operator. It has a wonderfully simple implementation and operational semantics, but it's hard to use in large expressions or with nonlinear dataflow. Lambda-calculus-based composition combinators like o* and h may be easier to use.
  5. Code need not have tree structure. The great success of expression languages has accustomed us to thinking that programs must be trees, but those in stack languages are (mostly) sequences. There is another way! (So what about dag and digraph structures?)
  6. Macros and dynamism work well in low-level languages. These two features are most common in high-level languages, but this is largely a historical accident. Forth happily allows redefining anything at runtime, and uses macros (in the form of compile-time words) for its control structures. Its users find both hugely convenient, and neither is a common source of problems. (Many assemblers also get a lot of power from macros, which is one of the reasons their users were loath to abandon them, but this lesson has been forgotten with their decline.) (This has nothing to do with concatenative languages — just Forth — but it's important enough to mention anyway.)

I suspect stack-based languages per se don't matter that much any more, but they illuminate dimensions of the language design space we wouldn't otherwise notice.

These are a few of my favourite macros

Much of this post seems familiar to me, as if I've seen it somewhere else, perhaps on LL1-discuss or comp.lang.*. But I can't find the post I remember, so maybe I'm imagining someone else saying what I'm thinking.

Macros are flexible, and unfamiliar to most programmers, so they inspire a lot of confusion (more, in my opinion, than they deserve, but that's a topic for another day). Sometimes people try to make sense of this confusion by classifying them into a few categories. These classifications typically include:

  1. Macros that evaluate some arguments lazily, like if and and, or repeatedly, like while.
  2. Macros that pass some arguments by reference rather than by value, like the setf family.
  3. Binding macros that simply save a lambda: with-open-file. In languages with very terse lambda (like Smalltalk) these are not very useful, but in languages that require something like (lambda (x) ...), they're useful and common.
  4. Macros that quote some arguments (i.e. treat them as data, not expressions).
  5. Defining macros like defstruct.
  6. Unhygienic binding macros: op, aif.

The reasons for the classifications vary. Sometimes the point is that all of the categories are either trivial or controversial. (The people making this argument usually say the trivial ones should be expressed functionally, and the controversial ones should not be expressed at all.) Sometimes, as in this case, the point is that some of the categories are hard to express in any other way. Sometimes the point is that some categories are common enough that they should be built in to the language (e.g. laziness) or supported in some other way (e.g. terse lambda) rather than requiring macros.

These classifications aren't wrong, but they are misleading, because the most valuable macros don't fit any of these categories. Instead they do what any good abstraction does: they hide irrelevant details. Here are some of my favourites.

Lazy cons

If you want to use lazy streams in an eager language, you can build them out of delay and eager lists. But this is easy to get wrong. Do you cons an item onto a stream with (delay (cons a b))? (cons (delay a) (delay b))? (delay (cons (delay a) b)? Something else?

This is hard enough that there's a paper about which one is best and why. Even if you know (and regardless of whether you disagree with that paper), it's easy to make mistakes when writing the delays by hand. But the exact place where laziness is introduced is an implementation detail; code producing streams doesn't usually care about it. A lazy-cons macro can hide that detail, so you can use lazy streams without worrying about how they work. That's what any good abstraction should do.

Sequencing actions

Haskell's do is not, officially, a macro, but this is only because standard Haskell doesn't have macros; in any case do is defined and implemented by macroexpansion. Its purpose is to allow stateful code to be written sequentially, in imperative style. Its expansion is a hideous chain of nested >>= and lambdas, which no one wants to write by hand (or read). Without this macro, IO actions would be much more awkward to use. Some of this awkwardness could be recovered through functions like sequence, but the use of actions to write in imperative style would be impractical. do hides the irrelevant functional plumbing and relieves the pain of something necessary but very un-Haskell-like. Really, would you want to use Haskell without it?

List comprehensions

Haskell's list comprehensions, like its do, express something that could be done with functions, but less readably. List comprehensions combine the functionality of map, mapcat, and filter in a binding construct that looks a lot like set comprehensions. They save having to mention those list functions or write any lambdas.

I sometimes wish there was a way to get a fold in there too, but it's a good macro as it is.

Haskell list comprehensions wear a pretty syntactic skin over their macro structure, but this is not essential. Clojure's for demonstrates that a bare macro works as well.

Partial application

Goo's op (and its descendants like Arc's [... _ ...] and Clojure's #(... % ...)) is an unhygienic binding macro that abbreviates partial application and other simple lambdas by making the argument list implicit. It hides the irrelevant detail of naming arguments, which makes it much terser than lambda, and makes high-order functions easier to use.

Language embedding

There is a class of macros that embed other languages, with semantics different from the host. The composition macro from my earlier posts is one such. A lazily macro that embeds a language with implicit laziness is another. The embedded languages can be very different from the host: macros for defining parsers, for example, often look nothing like the host language. Instead of function call, their important forms are concatenation, alternatives, and repetition. Macros for embedding Prolog look like the host language, but have very different semantics, which would be awkward to express otherwise.

Like do, these macros replace ugly, repetitive code (typically with a lot of explicit lambdas) with something simpler and much closer to pseudocode.

The usual tricks

Most macros do fall into the simple categories: binding, laziness and other calling conventions, quotation, defining, etc. It's easy to think, of each of these uses, that it ought to be built into the language so you don't have to “fake” it using macros.

Fake? There's nothing wrong with using a language's expressive power to supply features it doesn't have! That's what abstraction is for!

The C preprocessor is a very useful thing, but of course it has given macros a bad name. I suspect this colors the thinking even of people who do know real (i.e. tree) macros, leading them to prefer a “proper” built-in feature to its macro implementation.

From my point of view, a macro is much better than a built-in feature. A language feature complicates the language's kernel, making it harder to implement, and in particular harder to analyze. Macros cover all of them, plus others the designers haven't thought of, in a single feature — and they don't even complicate analysis, because they disappear when expanded, so the analysis phase never sees them.

(To be fair, macros do require the language's runtime to be present at compile-time, and create the possibility of phasing bugs. But either interactive compilation or self-hosting requires the former anyway, and the latter only interferes with macros, so at worst it's equivalent to not having them. Neither is remotely as bad as being unable to express things the language designer didn't think of.)

So I see macros not as a weird, overpowered feature but as an abstractive tool nearly as important as functions and classes. Every language that aims for expressive power should have them.

Taming unspecified behavior

When a language spec leaves the behavior of some operation unspecified, there are several things an implementation can do:

  • Signal an error in the usual way (whatever that is).
  • Extend the language by defining a useful meaning.
  • Crash, i.e. report an unrecoverable error.
  • Return an arbitrary value.
  • Break safety by e.g. corrupting memory.
  • Choose behavior unpredictably. Some C compilers now do this, to the horror of their users.

Traditionally, when a spec leaves some behavior unspecified, it's completely unspecified, with no constraints at all on what implementations can do. This maximizes implementor freedom, but minimizes the amount of behaviour users can rely on. This sometimes forces them into contortions to stay within the specified language, or leads them to write nonportable code without realizing it. Even worse, implementors sometimes take lack of specification as a license for arbitrarily perverse behaviour.

A spec can reduce these problems by leaving behavior only partially unspecified. Here are some options, in roughly increasing order of unspecifiedness:

Signals an error
The meaning of this operation is undefined — so undefined that implementations must detect it and report it. This provides maximum safety for users, but no freedom for implementors. (This isn't actually unspecified behaviour, but it's pragmatically similar.)
Signals an error unless extended
Implementations must detect the undefined behavior, but they have the option of giving it some useful definition instead of signaling an error. For example, in a language without complex numbers, (sqrt -2) might be specified to signal an error, but an implementation that does have complex numbers could make it return one. In Scheme, (map - (vector 1 2 3)) might be specified to signal an error (because the vector is not a list) unless map is extended to work on other sequence types. This lets implementors extend where they want to while preserving safety everywhere else, so it's a good default for languages that aim to be safe.
Unspecified value
The operation will return normally and safely, but the result is unspecified, often with constraints such as a type. For example, C's INT_MAX is an unspecified integer at least 32767. In Scheme, the result of (exact? (/ 1 2)) is unspecified but must be a boolean.
Unspecified but safe
The language's basic safety guarantees continue to apply, but behavior is otherwise unspecified. For example, the result of arithmetic overflow in many languages is unspecified — it might signal an error, it might overflow into bignums or flonums or +Inf, it might be modulo some constant, or it might return nil or nonsense — but it won't corrupt memory or crash.
Unspecified but implementationally unsurprising
The behaviour is not specified, but it should make sense in terms of some underlying model. For example, many languages do not specify what sort of pathnames their file operations accept, except that they should be those of the host system. C does not specify that the result of falling off the end of an array or dereferencing NULL is to blindly attempt to access that address, but that's what users expect.
Unspecified and unsafe
The language's usual safety guarantees no longer apply. Anything might happen, including crashes or corruption. In particular:
Unspecified but consistent
The implementation may choose whatever semantics it likes, but it must preserve those semantics when optimizing. It may not assume the operation won't happen, or choose semantics unpredictably.
Unspecified and unpredictable
Behavior is completely unspecified, and the compiler may do whatever it likes, even if it's inconsistent and doesn't make sense in terms of the underlying implementation. Avoid this! As John Regehr puts it, “A compiler that is very smart at recognizing and silently destroying [code with unspecified behavior] becomes effectively evil, from the developer’s point of view.”

These options are combinations of simpler constraints on behavior: safety; normal return vs. signaling an error; predictability; consistency with the underlying implementation. What other constraints, or combinations thereof, are useful?

Update 15 December: See also John Regehr's When is Undefined Behavior OK?

Unboxed arrays break identity

Common Lisp explicitly allows its implementations to copy numbers whenever they feel like it, so object identity is not reliable. Previously I said this was a relic of Maclisp, but I overlooked a simple, obvious stronger reason: unboxed arrays. Long ago on RRRS-authors, Pavel Curtis gave another example where numbers might be copied:

(let ((v (make-vector 1 3.0)))
      (eq? (vector-ref v 0) (vector-ref v 0)))

This returns true in any ordinary Scheme, because storing a number into a vector does not copy it. However, if v is an unboxed vector of floats, this will probably return false, because the number naturally gets boxed twice. It does in Racket:

> (require racket/flonum)
> (let ((v (make-flvector 1 3.0)))
    (eq? (flvector-ref v 0) (flvector-ref v 0)))


CL-USER> (make-array '() :element-type 'single-float :initial-element 3.0)
CL-USER> (eq (aref *) (aref *))

(That's a zero-dimensional array, with one element.)

Clojure doesn't explicitly allow copying of numbers, but does it anyway, of course:

user> (let [x 1.0 v [x]] (identical? (v 0) (v 0)))
user> (let [x 1.0 a (double-array [x])] (identical? (get a 0) (get a 0)))
user> (let [x 1.0 a (object-array [x])] (identical? (get a 0) (get a 0)))

It doesn't even require an array, since it sometimes unboxes ordinary variables without preventing multiple reboxing:

user> (let [x 1.0] (identical? x x))
user> (let [x (if true 1.0 1)] (identical? x x))

Scala hides the issue by making eq unavailable on potentially unboxed types like Float (and therefore on Any, which might be annoying):

scala> 1.0 eq 1.0
<console>:7: error: value eq is not a member of Double
       1.0 eq 1.0

Any language that boxes floats but wants efficient numerics practically has to support unboxed numeric vectors, and therefore allow implicit copying of numbers, since preventing it requires (undecidable) nonlocal analysis. So its spec must provide some permission to copy numbers — or any boxed type with an unboxed container; it's not specific to numbers. This permission need not be a blanket license to copy, though; it could be restricted to specialized arrays. Or, in order to permit unboxing variables without forcing the compiler to be paranoid about multiple reboxing, it could be permitted for a conservative approximation of "potentially unboxed numbers", e.g. those in local variables statically known to be numbers of a specific type, whose values come from unboxable operations (those that compute new numbers: sin, not car).

Does this make NaNboxing sound more attractive?

A rant on recursion exercises

Introductions to functional programming, and especially Lisp, traditionally have a lot of exercises involving recursion on lists. Students are expected to reimplement standard functions like map and filter and reverse, and sometimes dubious ones like flatten. This makes a certain amount of sense: if you're trying to teach recursion and lists, why not cover both at once?

For one thing, it makes functional programming look bad. How many students have learned from such material that Lisp is about recursion (and strangely named car and cdr operators), without any hint of why it's so highly regarded, or of what else it can do? How many have learned (or had their prejudice reinforced) that functional programming is a silly academic game with no relevance to actual programming?

It also teaches students bad habits. Experienced functional programmers follow the heuristic: if you're recursing on lists, you should probably be using some high-order function instead. Even if it's one of the awkward ones like fold or mappend/mapcat/concatMap. (Awkward in that code using them is often harder to read than that using map or filter, though not as hard as plain recursion.) But beginners are taught to do things the hard way.

Some recursion exercises also operate on multiple levels of list structure, such as flatten or a multilevel reverse. Which is odd, because arbitrarily nested lists are rare in practice (except in code, which has its own complexities). These exercises teach students to recurse on both car and cdr, when they should instead be learning that lists are recursive only in the cdr. (ML and Haskell are spared this problem, as their type systems cannot readily conceive of arbitrarily nested lists. This is an advantage of restrictive static typing I hadn't considered.)

One of the most important lessons to learn about recursion is how widely applicable it is: it's not just for lists, or for recursive data structures; it's for any problem that can be broken down into smaller problems. Teaching recursion only on lists obscures this, leading students to think it's a specialized tool for one data structure, rather than one of the most general tools for creating algorithms.

It would be more natural to teach structural recursion on trees — directory trees, for instance, or at any rate something that naturally has tree structure. (Not search trees, because students who aren't yet comfortable with recursion won't understand why anyone would want such a structure. And not expression trees, because code-as-data is a lot for beginners to swallow, and it teaches them that functional languages are only for writing compilers for functional languages.) Non-structural recursion could be taught with the same sort of not-obviously-recursive problems that are used to teach dynamic programming.

(Prompted by some Clojure exercises which exhibit some of these problems.)

Object capabilities overload references

Security by capabilities is uncontroversial, and even orthodox, judging by its ubiquity in new systems. (Operating systems take an unusually long time to adopt new ideas, due to their strong network effects, so it's easy for an idea to be orthodox without being used in any popular system.) One particularly popular variant – at least in research, if not in deployed systems – is the object-capability system, in which having a capability is identified with having a reference to an object.

It's easy to see why this is popular: it elegantly exploits the existing properties of object reference to provide powerful security guarantees without doing anything to explicitly track capabilities.

Unfortunately, it tends to be incompatible with reflection, and especially with heap-traversal operations like Squeak's nextObject, which allows iterating through all objects in memory. If you can see arbitrary objects, you can see arbitrary capabilities, and the object-capability system is useless.

Elegance often means making one component do several things. Sometimes this works well, but sometimes the component can't support all the loads placed on it. I think object capabilities are a case of this. They make the object-reference graph do the work of tracking who has what capabilities, but this works only if programs can't do much to modify the graph.

E has this problem twice: not only does it use reference for capabilities, it uses ordinary message passing as its mechanism for calling between security domains, relying on lack of reflection to prevent untrusted code from doing anything more than send messages. There's no access control on messages, so modules generally expose their untrusted interfaces through proxy objects which understand only public messages. Reflection would let programs see past the proxies to the objects behind them, allowing them to send messages to internal objects and defeating the domain boundary.

Reusing basic language features for security is seductively simple, but it's dangerous — once language semantics are security-critical, it's hard to extend them safely.

Style identifies authors

Yossi Kreinin points out a rarely-mentioned use of stylistic variation: personal style is a signature that helps identify who wrote a piece of code.

I find it easier to understand programmers' intents through their unique style. When they're all forced to write superficially similarly, I can't tell who wrote what, and what the subtext of the code is.

I'll illustrate the last point with a couple of examples. I knew O.M. before I ever saw him and before I even knew his name. To me, he was the programmer with the two spaces before the trailing const:

inline int x()  const;

I knew him through his code: mathematically elegant, obsessive about fine details of type-based binding and modeling. I could guess what he left out with an intent to maybe add it later. I understood him.

Likewise, I can always spot G.D.'s code by the right-leaning asterisk:

int *p,*q=arr+i;

G.D. certainly couldn't care less about types - similarly to most people with this asterisk alignment. I know his code: terse, efficient, to the point. I know what to expect.

I'd never thought about it, but I do this too. I learn individual styles, and use the identity of the authors to help me understand their code.

One of the main ways I use this is to adjust my credence in mistakes. If I see seemingly unnecessary infrastructure, like an interface with a single implementation, in code written by an overzealous practitioner of OO, then it probably really is unnecessary, so I can safely ignore it; if it was written by a better architect, then I wonder what purpose they had in mind and whether it's still needed. If I see a series of seemingly redundant fcloses, it helps to know that the author was someone whose sloppiness drove them to paranoia, because then I won't waste time looking for a good reason. If I see duplicated code from someone who's averse to creating abstractions, I can assume the duplication is unnecessary instead of poring over it to see what's different. But if I see complex, tangled code from a pedantic minimalist, I know I need to find out why. Knowing the author lets me prune unlikely lines of investigation to focus on the important questions.

(I was going to include concurrency issues as another example, but on reflection I think I don't make much use of authors for this. When dealing with shared state, I'm similarly paranoid regardless of whose code I'm reading, because it's easy for anyone to get wrong. Knowing the author only helps with mistakes some authors wouldn't make.)

It also helps with interpreting comments. Knowing the author's preferred terminology and abbreviations makes telegraphic comments and names less mysterious: is sz short for “size” or Hungarian notation for “string”? Knowing how they think, what they know and what they consider worth mentioning also helps, as it does in interpreting any communication.

Identifying authors is particularly easy in C++ or Perl, because they have a lot of stylistic choices with no obvious right answer, so there's lots of room for individual variation. In languages with fewer choices and strong stylistic traditions, like Java, it's harder. Lack of variation is traditionally supposed to be a good thing, on the grounds that it's noise obscuring the signal of programs. But if it serves to identify authors, maybe it's not noise after all.

Maintaining bad code, as a lesson

Alex and bcs suggest teaching students to maintain code (and, hopefully, showing them why they might want to write maintainable code) by giving them horrible code to modify:

  • a bunch of test cases
  • a pile of #$%#$ library that passes them but will fail on just about anything else
  • incomplete, inconstant and wrong documentation and specifications.
  • several applications that use the library (only some of which they are allowed to alter).
  • more bug reports than the whole class can address in the time allotted.

Grade them on how many bugs they fix. Include performance issues, feature requests, usability issues and even a few can-not-reproduce and works-as-intended issues. Just to be evil, include a bug where the code is clearly wrong but fixing it introduces a bug in one of the apps (one that can’t be altered) that is easy to spot by inspection but not covered by any tests.

Setup time: 15 man-years.

This isn't nearly as hard as it sounds, because bad code is ridiculously easy to write. You can take a piece of good code and turn it into an unmaintainable mess much faster than you could write that unmaintainable mess from scratch. It doesn't take clever obfuscation, just repeated stupidity. Hints:

  • Think of the worst code you've had to deal with, and imitate its mistakes.
  • Use unsuitable data representations. Strings are often a good bad choice. Don't forget to forget to provide escape sequences for any embedded strings.
  • Inline rampantly. This is how you get thousand-line functions nearly duplicated three times.
  • Misname things. There's nothing like a misleading name to delay understanding. A little Hungarian notation can help lengthen names without adding useful information.
  • Abuse state: Factorial fact; fact.compute(3); fact.getResult() ⇒ 6
  • Take commandments literally: if told you should “program to an interface, never an implementation”, add redundant interfaces to every class, even private ones.
  • Flout commandments: if told to “favor composition over inheritance”, replace perfectly good composition with inheritance.
  • Replace uses of the standard library and other convenient language features with unnecessary code.
  • Move variables to larger scopes, or to different classes or globals.
  • Add useless diagnostics, e.g. logging entry and exit to a function but not its arguments.
  • Add redundant, unnecessary safety checks, and omit necessary ones.
  • Introduce abstractions in the wrong places.
  • When you find a bug, add a special case to hide it. Or declare it a feature and add a test for it.
  • Expose the wrong things in interfaces.
  • Ignore your better judgement. Do what an idiot would do.
  • Handle new requirements by adding special cases. Do this last, so the special cases hinder refactoring.
  • Write tests, but don't worry too much about whether they pass. You'll only give the students the ones that pass; the ones that fail will become the bug reports.

For example, here's a simple perverse factorial:

#include <string.h>
#include <stdio.h>

char s[10];

void factorial(int n, char *num) {
  int i = 0, p = 0;
  s[0] = '1';
  for (i = 0; i < 10; ++i) {
    int x = (s[i] - 48) * n + p + 48;
    if (n == 1) {
      for (i = 0; i < 10; ++i)
        num[i] = s[9-i];
    if (s[i] == 0) {
      s[i] = '0';
      x = (s[i] - 48) * n + p + 48;
    p = 0;
    while (x > 57) {
      x -= 10;
    s[i] = x;
  goto add;

void test_one(int n, const char *expected) {
  char result[20] = "           "; //to detect lack of null-termination
  factorial(n, result);
  if (strncmp(result, expected, 10) || result[10] != ' ')
    fprintf(stderr, "factorial(%d) ⇒ %s (expected %s)\n", n, result, expected);

int main(int argc, char **argv) {
  //test_one(0, "0000000001"); //infinite loop
  test_one(3, "0000000006");
  test_one(10, "0003628800");
  test_one(3, "0000000006"); //fails due to leftovers
  test_one(14, "87178291200"); //overflows
  return 0;

That will probably take as long to understand as it did to write, and longer to make all five tests pass (never mind making it correct).

This is easy (and fun) enough that you could get students to do it — maybe have them write a problem to inflict on the next class, as the last homework of the term.

Homework for the more advanced student in computational mischief: automate this breakage.

Homework for the Ph.D. student in software engineering research: set up a puzzle site like 4clojure where users solve maintenance problems. Measure the effects of different stylistic flaws on the time taken and success rate. Find out which ones really matter.

Oversimplifying bottom

There are a few things that bother me about the usual treatment of in Haskell culture:

Multiple failures

I'm uncomfortable with the practice of treating all failures as a single value, since they're actually distinguishable. Haskell denotations (ignoring side effects and implementation-dependent behavior) look like this1:

data Denotation = Return Value | Throw Exception | Nontermination

The traditional combines the Throw and Nontermination constructors. Since catch/handle can distinguish different exceptions, these are actually many different denotations, not one.

This must be a standard objection. The obvious response is that the distinctions between bottoms are usually irrelevant, and ignoring them makes reasoning about programs easier, even if it does occasionally give incorrect results. This isn't consistent with usage in the Haskell community, though, because Haskellers use most when they want rigor. Nor is it consistent with the rest of Haskell culture, which generally frowns on unsafe shortcuts.

Haskellers appear to be aware of this, but they usually ignore it. I suspect this is due partly to a distaste for denotations outside the Return branch — exceptions aren't “proper” values and don't deserve attention.

This does cause confusion. For instance, it leads ezyang, when discussing definedness monotonicity, to identify nonmonotonic functions as uncomputable, without mentioning that this applies only to nontermination, not to other bottoms. When he later discusses distinguishing between bottoms, he calls this “a very operational notion” and ignores the possibility of catching exceptions — perhaps because Haskell discourages this by putting catch and handle in IO, even though there's nothing impure or unsafe about them.

The bottom type and the bottom value

Haskell culture also traditionally conflates the failure with the bottom type, i.e. the values that are members of every type. This is understandable, since they coincide in Haskell — but that's a quirk of Haskell, not a law of denotational semantics.

It happens to be the case in Haskell that failures are the only denotations found in every type. But this is because Haskell types describe only one of the three branches of the denotation. A Haskell type describes what values an expression might return, but says nothing about exceptions or nontermination. If Haskell types could also specify whether an expression might terminate, and what exceptions it might throw (as in Java and C++), then the bottom type would be empty. (And type inference would be undecidable, which is why Haskell doesn't do it.) Failures are in the bottom type only because Haskell's type system can't talk about them.

It's also possible for the bottom type to contain values that aren't failures. For example, if Java didn't have primitive types, its null would be a bottom value, since it's a member of every (static) type. But it's not a failure: there are still useful operations on it, e.g. ==. Nor is it a zero of most operations.

It's probably safe to identify these two bottoms when speaking only of Haskell, but it seems to me Haskellers often do so even when speaking of denotational semantics in general.

Weakness is strength?

I worry that the treatment of stems from a pattern of treating Haskell's weaknesses as strengths, or as inevitable mathematical results, rather than as accidents of one language. For example, some Haskellers consider definedness monotonicity a virtue rather than a weakness. (Inability to handle errors makes it easy to prove that your program doesn't recover from them, but hard to make the program do so!) There might be something similar in attitudes to type system extensions, but I don't understand this area well enough to tell.

What do Haskellers (particularly denotationalists) think about this?

Added 6 Sep: Most of these issues (catch is impure, bottom type = bottom value, monotonicity) involve treating quirks of Haskell as universal, but the conflation of multiple bottoms is the reverse: people ignore part of Haskell's semantics in favour of a simpler approximation.

A long footnote about denoting denotations

1 The usual definition of Haskell denotations is that of Peyton Jones et al. in A semantics for imprecise exceptions: to hide the effect of implementation-dependent evaluation order on exceptions (e.g. what exception does 1/0 + undefined throw?), failure denotations give not a single exception but a set of possible exceptions. In addition, nontermination is treated as an exception, so denotations look like this:

data Denotation = Return Value | Throw [Exception]

Since even return values can be implementation-dependent, I think it's cleaner to make the whole denotation a set of implementation-dependent results. Also, nontermination is quite different from exceptions, in both implementation and semantics: it's a zero of catch, so any attempt to detect it will also give Nontermination. (This is apparently not considered an important distinction in formal semantics, but it's very different operationally.) This gives a slightly more complicated definition:

data Result = Return Value | Throw Exception | Nontermination
type Denotation = [Result]

Actual implementations invariably pick one result, so the complexity of treating a denotation as a set of possible results is usually irrelevant and can be ignored, leaving this definition of a denotation-as-implemented:

data Denotation = Return Value | Throw Exception | Nontermination

If nontermination is treated as an exception, this becomes pleasingly conventional:

type Denotation = Either Exception Value