OOCCaammll:: II SSwwiippeedd RRiigghhtt

Published January 14, 2022

Finding a programming language is like finding a partner. You survey the
field of candidates, filter with some questionable heuristics (geography,
age, appearance), assesses each candidateʼs relative merits, then try for
a date. A sequence of good dates might lead to a committed relationship.

After a decade-long relationship with GGoo, I wondered if maybe we should see
other people.

QQuueessttiioonnaabbllee HHeeuurriissttiiccss

The field is so large that you have to exclude most candidates after
cursory examination. I aim for heuristics that correlate with what I
actually want (and hope that Iʼm wise enough to know what that is). The
map is not the territory, but the mapʼs approximation aids forward
progress. I chose the following approximations:

_I _c_a_n _b_u_i_l_d _t_h_e _c_o_m_p_i_l_e_r _o_n _O_p_e_n_B_S_D

This heuristic hopes to approximate code quality and the language
communityʼs attention to detail.

OpenBSD is my main operating system. At a practical level, I need a
language that works in that environment. The skillful porters of OpenBSD
can build almost anything, so I could just ppkkgg__aadddd a language binary and be
done. But if a crippled C hacker like myself can build the language from
source, the languageʼs community must have kept things exceptionally clean.
Plus, if anything goes wrong, I need to be able to build and test fixes
myself. The porters and language designers have more important work than
holding my hand.

OpenBSD is _i_n_t_e_n_t_i_o_n_a_l_l_y _h_o_s_t_i_l_e[1] to software. It doesnʼt work around
bugs, and makes an active effort to crash programs that misbehave. For me,
this isnʼt about security; itʼs about correctness. I want my programs to
behave. If a language implementation can survive on OpenBSD, I can adjust
my priors in favor of the implementation being sturdy.

Lastly, OpenBSD is a niche. It has relatively few users and acts
differently. If the language community paid sufficient attention to this
niche, my priors move in favor of attention to other corner cases. The
software that I respect most (SQLite, OpenBSD, Go) has a habit of running
on strange hardware, weird operating systems, and unusual platforms. I
donʼt think thatʼs statistical noise.

_M_e_m_o_r_y _s_a_f_e

I suck at memory management. Memory-related flaws are frequent enough in
_C_V_E_s[2] to suggest that Iʼm not alone. Iʼve never written a program that
needed such precise control of memory that I couldnʼt outsource that work
to a compiler or runtime.

This heuristic doesnʼt require a runtime garbage collector, but the
language ecosystem needs to prevent me from freeing freed memory, writing
past the end of a buffer, or dereferencing an invalid pointer. Maybe itʼs
a compile-time GC, or a static analysis tool, or the spoils of a fiddle
competition with the devil. I donʼt really care.

_L_e_s_s _t_h_a_n _m_e_d_i_a_n _c_o_m_p_l_e_x_i_t_y

A wise man spends less than he earns, and _S_y_s_t_e_m _2[3] is a limited resource.
If I spend mental energy on syntactic nuances, remembering abstractions, or
understanding tall stacks, I have no juice left for solving worthwhile
problems.

Itʼs probably a foolʼs errand to objectively calculate
_l_a_n_g_u_a_g_e _c_o_m_p_l_e_x_i_t_y[4], but I want something closer to a Lua, Go, or Scheme;
maybe a Standard ML or Haskell, but nothing like a Java, Ada, or C++. Those
are more than my modest brain can handle. This implies no JVM and no CLR.
Theyʼre impressive platforms that do too much.

Donʼt get me wrong. If I could eat steak and ice cream every day I would.
Iʼm sure a Four Seasons resort in Bali makes a lovely home. I enjoy
syntactic sugar as much as the real stuff, but theyʼre all outside my
budget. I loved Ada once, our fling was exciting, and her friends are good
people, but sheʼs too much for a simple man.

_B_a_s_i_c _s_t_a_t_i_c _a_n_a_l_y_s_i_s

Normals are surprised when I describe how much of my career Iʼve devoted
to fixing typos that made it to production, or writing repetitive test
cases to avoid those typos. Like memory management, this is a job for a
machine. Make sure the iʼs and tʼs have appropriate dots and crosses. If
you wonʼt make it to dinner, please let me know before I drive to the
restaurant and order appetizers.

And, when I say _b_a_s_i_c, thatʼs an upper bound not a lower one. I appreciate
a warning if youʼll miss a date, but you donʼt have to ask permission to
use the restroom.

_F_a_s_t _s_t_a_r_t _u_p _t_i_m_e

I write a lot of command line tools. Waiting 20 ms for hello world is fine,
200 ms is not. Start up time should also scale well. If hello world is
snappy but a linter or formatter is slow, it wonʼt work between us.

IIttʼss MMee NNoott YYoouu

With those heuristics in hand, I surveyed the field. The following
languages looked nice from a distance, but they didnʼt make the cut. These
are the first turn offs I encountered, but many would be excluded on other
grounds as well.

  • AAddaa, CCoommmmoonn LLiisspp and RRuusstt are too complex for me
  • CC and DD arenʼt memory safe
  • CCrryyssttaall canʼt bootstrap on my laptop (it has to cross-compile)
  • CClloojjuurree needs the JVM. CC## and FF## need the CLR
  • Native compilers for DDaarrtt, KKoottlliinn, SSccaallaa, and SSwwiifftt donʼt support
    OpenBSD
  • EErrllaanngg and EElliixxiirr need BEAM which starts too slowly
  • FFaaccttoorr, FFoorrtthh, GGrraavviittyy, LLuuaa, PPeerrll, PPHHPP, PPrroolloogg, RRuubbyy, and TTccll have
    insufficient static analysis
  • Bootstrapping GGHHCC exhausts RAM on my laptop, and bootstrapping
    FFrreeee PPaassccaall was beyond my skills
  • Building HHaaxxee (because of Neko), Node.js (for TTyyppeeSSccrriipptt), and ZZiigg
    all proved more than my skills could muster
  • Modern RRaacckkeett (using Chez Scheme) wonʼt compile under OpenBSDʼs W^X
    restrictions
  • SSttaannddaarrdd MMLL has several good implementations. I could only build
    Poly/ML but had trouble using it for non-trivial programs.

FFiinnaall FFoouurr

That left four candidates: GGoo, NNiimm, OOCCaammll, and PPyytthhoonn. To evaluate them
somewhat objectively, I made a wishlist of about 30 entries. I distributed
100 points among those wishes based on how important they were to me. For
each wish and each language, I gave a percentage for how well the language
fulfilled the wish.

The best languages for me, in order, ended up being: OCaml, Go, Python,
then Nim. The _s_p_r_e_a_d_s_h_e_e_t _i_s _a_v_a_i_l_a_b_l_e[5], but here are some highlights:

_I_n_t_e_r_f_a_c_e _t_y_p_e_s

This is structural typing. I want to define a set of values based on the
operations they allow. This paradigm follows my intuition from the real
world: if the prong fits in the slot, my device will charge. It also lets
me ignore irrelevant details when writing a function. I donʼt care what a
value is named, or where it came from, or how it works inside; as long as
it does what I need, the function can proceed.

Structural typing, when embraced throughout a language ecosystem provides
excellent orthogonality between components. GGoo does beautifully here. Its
iioo..RReeaaddeerr and iioo..WWrriitteerr interfaces are the canonical example. ffmmtt..FFpprriinnttff
doesnʼt care if Iʼm writing to a file, a terminal, a network socket, or
an encoded-and-compressed buffer in memory. If the thing does WWrriittee
correctly, ffmmtt..FFpprriinnttff can use it.

Modern languages usually have good support for interfaces, and all
candidates scored well. PPyytthhoonn scored the best because of mypyʼs Protocol
feature. This makes sense: Pythonistas have been duck-typing since I was
writing BASIC.

OOCCaammll has two flavors of structural typing: objects (the interface is
defined by method signatures), and modules (the interface is defined by
module signatures). Structural typing for objects is especially cool since
it infers an interface based on usage. That is, you donʼt have to define
an interface explicitly or choose a name for it. Of course, you probably
want to be explicit eventually, but itʼs a handy feature when prototyping.

I donʼt know why the OCaml ecosystem uses structural typing of objects so
rarely. Module types seem more common, but even those donʼt seem widely
used. It seems like the OCaml community is leaving money on the table here,
so Iʼm probably missing something.

As best I can tell, structural typing in NNiimm is called _c_o_n_c_e_p_t_s. The
feature is experimental and seems complicated but usable.

_S_u_m _t_y_p_e_s

I dreamed of a language with disjoint unions (sum types, variant records,
tagged unions, discriminated unions, whatever theyʼre called today). For
full points the language needs lightweight syntax and the static analysis
tool needs exhaustive pattern matching.

Like interfaces, sum types feel natural. My mind is inclined to split a
group into subgroups and then think about each subgroup independently.
Lightweight syntax is important because I wonʼt wade through heavy syntax.
This is my own flawed psychology, eating candy when I should eat vegetables,
but I hope for a partner who can balance my worst inclinations.

As Iʼve programmed in GGoo for the last ten years, Iʼve wanted this feature
more than any other; yes, even more than generics. On three occasions I
created a small language that compiles sum types into Go interfaces. I gave
up each time because the Go compiler doesnʼt have exhaustiveness checks,
but sum types remained on my wishlist.

OOCCaammll rocks this one. It has lovely sum types on a lithe syntactic frame.
mmaattcchh … wwiitthh does exhaustive pattern matching, infers types based on the
patterns, and is the stuff of my dreams.

In PPyytthhoonn, mypy classes with a LLiitteerraall tag attribute can fake sum types
with sufficient effort. The syntax weighs less than it does in Go, but not
by much. NNiimm is similar to Go: you can fake sum types with sufficiently
complex class hierarchies.

_G_o_o_d _h_y_g_i_e_n_e

I mean this in the sense of _c_l_e_a_n_l_i_n_e_s_s or
_p_r_a_c_t_i_c_e_s _t_h_a_t _p_r_e_s_e_r_v_e _g_o_o_d _h_e_a_l_t_h, not hygienic macros. I want a language
whose community emphasizes clean, high-quality code. It values correctness
and security fixes over new features. The community would rather write
their own small-but-correct library than use a third-party library with
bugs and poor maintenance. This includes an ethos of avoiding dependencies
where reasonable (not like JS or Perl where everything is solved by
installing a shady, third-party package). Most software sucks, but I prefer
_s_o_f_t_w_a_r_e _t_h_a_t _s_u_c_k_s _l_e_s_s[6].

For many years, I under-valued the importance of a programming language
community. Like a natural language, much of the value in a programming
language is the community it grants access to.

GGoo is the best example I know. They strive for clean language semantics,
few pieces, rapid bug fixes, good security. Theyʼre glad to wait a dozen
years before adding generics if it means they can do it right. The core Go
team takes bug reports seriously, adds regression tests religiously, and
follows consistent policies toward releases and backwards compatibility.
Iʼve learned a lot from the Go in this area.

The core Go team is also not afraid of writing code when needed: Go has its
own code generator, its own TLS stack, its own regular expression engine.
This isnʼt about _N_o_t _I_n_v_e_n_t_e_d _H_e_r_e. These libraries are often better than
the ones used elsewhere and they always fit better in the Go ecosystem.
Incidentally, I also think the world would benefit from greater software
diversity, but thatʼs for another article.

OOCCaammll seems solid too. The community is pretty focused on correctness and
quality; thatʼs what drew many of them to OCaml in the first place. Public
discussions emphasize correctness, and language design decisions often
cover the theoretical foundations of language semantics. Xavier Leroy even
_f_o_u_n_d _a_n _I_n_t_e_l _C_P_U _b_u_g[7] through persistently debugging the OCaml compiler.
The community leans toward dependency bloat a little, but itʼs not awful.

I had a hard time evaluating PPyytthhoonn on this metric. The community is so
large that I found it difficult to pin down. The main implementation seems
solid, the community has some dependency bloat and the language is adding
new features rapidly. I donʼt know what to make of it.

NNiimm falls on the exploration side of the _e_x_p_l_o_r_a_t_i_o_n_-_e_x_p_l_o_i_t_a_t_i_o_n _s_p_e_c_t_r_u_m[8].
The language designers add new features rapidly, but those features donʼt
often work consistently. A quick perusal turns up regular users who
encounter frequent problems.

Donʼt get me wrong, exploration is great. The programming world needs more
explorers and I think languages have a lot of room for improvement. Iʼm
inclined to exploration myself. However, I want my daily-driver to be on
the _e_x_p_l_o_i_t_a_t_i_o_n side: do what works and do it really well.

_L_a_r_g_e _c_o_m_m_u_n_i_t_y

I only mention this because I ignored it for most of my career: I want a
language with a large and active community. Some developers are more
productive and creative than others. As far as I can tell, those developers
are evenly distributed among all the language communities. So the
probability of finding one in the Python community is far more likely than
finding one in the OCaml community, just based on size.

I wish this wasnʼt so and the best language could win on its own merits.
But itʼs the same reason China and the US win so many Olympic medals
compared to Honduras and Angola. In the programming world, the classic way
around this is to call C. C has a huge community and a small language can
fall back to C when its own community is too small. JVM languages use the
same trick.

Because Iʼm interested in _a_c_t_i_v_e developers, not users, I based this
metric on the number of unique committers to the languageʼs main
repository. That puts Go and Python in a tie, followed by Nim, then OCaml.
I was surprised that Go was even remotely close to Python on this metric.
Are Gophers really that much more likely than Pythonistas to contribute to
the main repository? OCaml got dinged here because its standard library is
so small: most libraries live in separate repositories so they donʼt count
in this metric. Why is the truth so hard to come by?

CCoonncclluussiioonn

Anyway, thatʼs enough words. You can look at _t_h_e _s_p_r_e_a_d_s_h_e_e_t[5] for
details.

The end result was OCaml with 90 points, Go with 78, Python with 74, and
Nim with 69. Based on the discussion above, you may not see how OCaml won.
Never underestimate the long tail.

I think OCaml has remained in relative obscurity partly because itʼs
functional and partly because it doesnʼt have one big, flashy marketing
point. C gave you access to Unix. Perl had the best of sshh and aawwkk. Python
had clean syntax. Ruby had Rails. Go did concurrency. OCaml does many
things well, but itʼs not the best at any one thing. Thatʼs great for
getting things done, but suboptimal for capturing market share.

I fell prey to the same thinking. I liked OCaml when we first met 15 years
ago, Iʼm not scared of lambdas, and I even enjoy OCamlʼs family (hi,
_J_o_C_a_m_l[9]), but I still didnʼt expect Ocaml to look so good. I figured Go
would triumph and Iʼd remain in my current, happy relationship. Instead, I
swiped right on OCaml. Iʼm looking forward to our first date.

1: https://www.youtube.com/watch?v=_q8zzWHj15Q
2: https://cve.mitre.org/
3: https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow#Two_systems
4: https://mndrix.blogspot.com/2017/03/programming-languages-by-spec-size.html
5: https://docs.google.com/spreadsheets/d/1nEoQeRj4OegRgqCeMmy0JQt9xrEIT2issVGQj7Z5Uho/edit?usp=sharing
6: https://suckless.org/philosophy/
7: http://gallium.inria.fr/blog/intel-skylake-bug/
8: http://strategy.sjsu.edu/www.stable/pdf/March,%20J.%20G.%20%281991%29.%20Organization%20Science%202%281%29%2071-87.pdf
9: http://jocaml.inria.fr/