OCaml: I Swiped Right

· text

Finding a programming language is like finding a partner. You survey the field of candidates, filter with some questionable heuristics (geography, age, appearance), assesses each candidateʼs relative merits, then try for a date. A sequence of good dates might lead to a committed relationship.

After a decade-long relationship with Go, I wondered if maybe we should see other people.

Questionable Heuristics

The field is so large that you have to exclude most candidates after cursory examination. I aim for heuristics that correlate with what I actually want (and hope that Iʼm wise enough to know what that is). The map is not the territory, but the mapʼs approximation aids forward progress. I chose the following approximations:

I can build the compiler on OpenBSD

This heuristic hopes to approximate code quality and the language communityʼs attention to detail.

OpenBSD is my main operating system. At a practical level, I need a language that works in that environment. The skillful porters of OpenBSD can build almost anything, so I could just pkg_add a language binary and be done. But if a crippled C hacker like myself can build the language from source, the languageʼs community must have kept things exceptionally clean. Plus, if anything goes wrong, I need to be able to build and test fixes myself. The porters and language designers have more important work than holding my hand.

OpenBSD is intentionally hostile to software. It doesnʼt work around bugs, and makes an active effort to crash programs that misbehave. For me, this isnʼt about security; itʼs about correctness. I want my programs to behave. If a language implementation can survive on OpenBSD, I can adjust my priors in favor of the implementation being sturdy.

Lastly, OpenBSD is a niche. It has relatively few users and acts differently. If the language community paid sufficient attention to this niche, my priors move in favor of attention to other corner cases. The software that I respect most (SQLite, OpenBSD, Go) has a habit of running on strange hardware, weird operating systems, and unusual platforms. I donʼt think thatʼs statistical noise.

Memory safe

I suck at memory management. Memory-related flaws are frequent enough in CVEs to suggest that Iʼm not alone. Iʼve never written a program that needed such precise control of memory that I couldnʼt outsource that work to a compiler or runtime.

This heuristic doesnʼt require a runtime garbage collector, but the language ecosystem needs to prevent me from freeing freed memory, writing past the end of a buffer, or dereferencing an invalid pointer. Maybe itʼs a compile-time GC, or a static analysis tool, or the spoils of a fiddle competition with the devil. I donʼt really care.

Less than median complexity

A wise man spends less than he earns, and System 2 is a limited resource. If I spend mental energy on syntactic nuances, remembering abstractions, or understanding tall stacks, I have no juice left for solving worthwhile problems.

Itʼs probably a foolʼs errand to objectively calculate language complexity, but I want something closer to a Lua, Go, or Scheme; maybe a Standard ML or Haskell, but nothing like a Java, Ada, or C++. Those are more than my modest brain can handle. This implies no JVM and no CLR. Theyʼre impressive platforms that do too much.

Donʼt get me wrong. If I could eat steak and ice cream every day I would. Iʼm sure a Four Seasons resort in Bali makes a lovely home. I enjoy syntactic sugar as much as the real stuff, but theyʼre all outside my budget. I loved Ada once, our fling was exciting, and her friends are good people, but sheʼs too much for a simple man.

Basic static analysis

Normals are surprised when I describe how much of my career Iʼve devoted to fixing typos that made it to production, or writing repetitive test cases to avoid those typos. Like memory management, this is a job for a machine. Make sure the iʼs and tʼs have appropriate dots and crosses. If you wonʼt make it to dinner, please let me know before I drive to the restaurant and order appetizers.

And, when I say basic, thatʼs an upper bound not a lower one. I appreciate a warning if youʼll miss a date, but you donʼt have to ask permission to use the restroom.

Fast start up time

I write a lot of command line tools. Waiting 20 ms for hello world is fine, 200 ms is not. Start up time should also scale well. If hello world is snappy but a linter or formatter is slow, it wonʼt work between us.

Itʼs Me Not You

With those heuristics in hand, I surveyed the field. The following languages looked nice from a distance, but they didnʼt make the cut. These are the first turn offs I encountered, but many would be excluded on other grounds as well.

Final Four

That left four candidates: Go, Nim, OCaml, and Python. To evaluate them somewhat objectively, I made a wishlist of about 30 entries. I distributed 100 points among those wishes based on how important they were to me. For each wish and each language, I gave a percentage for how well the language fulfilled the wish.

The best languages for me, in order, ended up being: OCaml, Go, Python, then Nim. The spreadsheet is available, but here are some highlights:

Interface types

This is structural typing. I want to define a set of values based on the operations they allow. This paradigm follows my intuition from the real world: if the prong fits in the slot, my device will charge. It also lets me ignore irrelevant details when writing a function. I donʼt care what a value is named, or where it came from, or how it works inside; as long as it does what I need, the function can proceed.

Structural typing, when embraced throughout a language ecosystem provides excellent orthogonality between components. Go does beautifully here. Its io.Reader and io.Writer interfaces are the canonical example. fmt.Fprintf doesnʼt care if Iʼm writing to a file, a terminal, a network socket, or an encoded-and-compressed buffer in memory. If the thing does Write correctly, fmt.Fprintf can use it.

Modern languages usually have good support for interfaces, and all candidates scored well. Python scored the best because of mypyʼs Protocol feature. This makes sense: Pythonistas have been duck-typing since I was writing BASIC.

OCaml has two flavors of structural typing: objects (the interface is defined by method signatures), and modules (the interface is defined by module signatures). Structural typing for objects is especially cool since it infers an interface based on usage. That is, you donʼt have to define an interface explicitly or choose a name for it. Of course, you probably want to be explicit eventually, but itʼs a handy feature when prototyping.

I donʼt know why the OCaml ecosystem uses structural typing of objects so rarely. Module types seem more common, but even those donʼt seem widely used. It seems like the OCaml community is leaving money on the table here, so Iʼm probably missing something.

As best I can tell, structural typing in Nim is called concepts. The feature is experimental and seems complicated but usable.

Sum types

I dreamed of a language with disjoint unions (sum types, variant records, tagged unions, discriminated unions, whatever theyʼre called today). For full points the language needs lightweight syntax and the static analysis tool needs exhaustive pattern matching.

Like interfaces, sum types feel natural. My mind is inclined to split a group into subgroups and then think about each subgroup independently. Lightweight syntax is important because I wonʼt wade through heavy syntax. This is my own flawed psychology, eating candy when I should eat vegetables, but I hope for a partner who can balance my worst inclinations.

As Iʼve programmed in Go for the last ten years, Iʼve wanted this feature more than any other; yes, even more than generics. On three occasions I created a small language that compiles sum types into Go interfaces. I gave up each time because the Go compiler doesnʼt have exhaustiveness checks, but sum types remained on my wishlist.

OCaml rocks this one. It has lovely sum types on a lithe syntactic frame. match … with does exhaustive pattern matching, infers types based on the patterns, and is the stuff of my dreams.

In Python, mypy classes with a Literal tag attribute can fake sum types with sufficient effort. The syntax weighs less than it does in Go, but not by much. Nim is similar to Go: you can fake sum types with sufficiently complex class hierarchies.

Good hygiene

I mean this in the sense of cleanliness or practices that preserve good health, not hygienic macros. I want a language whose community emphasizes clean, high-quality code. It values correctness and security fixes over new features. The community would rather write their own small-but-correct library than use a third-party library with bugs and poor maintenance. This includes an ethos of avoiding dependencies where reasonable (not like JS or Perl where everything is solved by installing a shady, third-party package). Most software sucks, but I prefer software that sucks less.

For many years, I under-valued the importance of a programming language community. Like a natural language, much of the value in a programming language is the community it grants access to.

Go is the best example I know. They strive for clean language semantics, few pieces, rapid bug fixes, good security. Theyʼre glad to wait a dozen years before adding generics if it means they can do it right. The core Go team takes bug reports seriously, adds regression tests religiously, and follows consistent policies toward releases and backwards compatibility. Iʼve learned a lot from the Go in this area.

The core Go team is also not afraid of writing code when needed: Go has its own code generator, its own TLS stack, its own regular expression engine. This isnʼt about Not Invented Here. These libraries are often better than the ones used elsewhere and they always fit better in the Go ecosystem. Incidentally, I also think the world would benefit from greater software diversity, but thatʼs for another article.

OCaml seems solid too. The community is pretty focused on correctness and quality; thatʼs what drew many of them to OCaml in the first place. Public discussions emphasize correctness, and language design decisions often cover the theoretical foundations of language semantics. Xavier Leroy even found an Intel CPU bug through persistently debugging the OCaml compiler. The community leans toward dependency bloat a little, but itʼs not awful.

I had a hard time evaluating Python on this metric. The community is so large that I found it difficult to pin down. The main implementation seems solid, the community has some dependency bloat and the language is adding new features rapidly. I donʼt know what to make of it.

Nim falls on the exploration side of the exploration-exploitation spectrum. The language designers add new features rapidly, but those features donʼt often work consistently. A quick perusal turns up regular users who encounter frequent problems.

Donʼt get me wrong, exploration is great. The programming world needs more explorers and I think languages have a lot of room for improvement. Iʼm inclined to exploration myself. However, I want my daily-driver to be on the exploitation side: do what works and do it really well.

Large community

I only mention this because I ignored it for most of my career: I want a language with a large and active community. Some developers are more productive and creative than others. As far as I can tell, those developers are evenly distributed among all the language communities. So the probability of finding one in the Python community is far more likely than finding one in the OCaml community, just based on size.

I wish this wasnʼt so and the best language could win on its own merits. But itʼs the same reason China and the US win so many Olympic medals compared to Honduras and Angola. In the programming world, the classic way around this is to call C. C has a huge community and a small language can fall back to C when its own community is too small. JVM languages use the same trick.

Because Iʼm interested in active developers, not users, I based this metric on the number of unique committers to the languageʼs main repository. That puts Go and Python in a tie, followed by Nim, then OCaml. I was surprised that Go was even remotely close to Python on this metric. Are Gophers really that much more likely than Pythonistas to contribute to the main repository? OCaml got dinged here because its standard library is so small: most libraries live in separate repositories so they donʼt count in this metric. Why is the truth so hard to come by?


Anyway, thatʼs enough words. You can look at the spreadsheet for details.

The end result was OCaml with 90 points, Go with 78, Python with 74, and Nim with 69. Based on the discussion above, you may not see how OCaml won. Never underestimate the long tail.

I think OCaml has remained in relative obscurity partly because itʼs functional and partly because it doesnʼt have one big, flashy marketing point. C gave you access to Unix. Perl had the best of sh and awk. Python had clean syntax. Ruby had Rails. Go did concurrency. OCaml does many things well, but itʼs not the best at any one thing. Thatʼs great for getting things done, but suboptimal for capturing market share.

I fell prey to the same thinking. I liked OCaml when we first met 15 years ago, Iʼm not scared of lambdas, and I even enjoy OCamlʼs family (hi, JoCaml), but I still didnʼt expect Ocaml to look so good. I figured Go would triumph and Iʼd remain in my current, happy relationship. Instead, I swiped right on OCaml. Iʼm looking forward to our first date.