OCaml: First Date

February 21, 2022 · text

Last month I swiped right on OCaml. How did the first date go? We get along well, and this could be the start of a mutually enjoyable relationship.

Turn-ons

Syntax

First impressions are superficial and good looks donʼt create a relationship, but since youʼll be staring at one another for hours on end, there should be some physical attraction.

In that regard, OCaml makes a favorable impression. I find the syntax pleasant to look at and converse with. Thereʼs a nice balance between succinct and verbose. OCaml is mostly just space-separated words: no parens for function calls, no curly braces. But sheʼs not afraid to be explicit with punctuation when needed.

I do miss having a clean marker to close blocks. OCaml has begin and end keywords for this (aliases for parentheses). Unfortunately, ocamlformat replaces them with literal parens. Standardized formatting is important, so Iʼve let that go. Incidentally, ocamlformat is fast and relatively consistent.

I like having nested comments again (I miss them in Go). I also like that OCaml lacks import statements. Instead, it uses module prefixes (like Sys.getcwd) which automatically load the relevant module. Module prefixes also make it clear which function Iʼm calling. If they get too verbose, let open Sys factors them out during a small scope.

Ocamlʼs lightweight syntax for modules is handy for minting local namespaces. It helps me avoid stuttering function names (foo_a, foo_b, foo_c, …).

Semantics

I had forgotten how much I enjoy algebraic data types, generics, partial function application, pattern matching, type inference, and exhaustive pattern matching. Iʼm really glad to have them available again.

Iʼve been pleased with OCamlʼs type checking on printf and the way that Printf.ksprintf lets me reuse formats in my own functions.

OCaml strikes a nice balance between functional, impertative, and object-oriented paradigms. Deep inside the machine, itʼs all just pointers and mutable memory. Sometimes I want to cut through all the abstractions and just mutate an array. Other times I like pretending that I live in a magical world of composable functions and immutable data. I typically favor opinionated languages, but this is one area where Iʼm glad to have choices.

Iʼm puzzled that the OCaml community doesnʼt use objects (the O in OCaml) more frequently. So far, OCaml has the best object system Iʼve encountered. I may write a separate article about it, but it really is great.

Logic programming is the only paradigm I miss in OCaml. Surprisingly often, I wish I could define a function as a succinct Prolog predicate. Maybe Iʼll write a library for it someday.

Anyway, all of this makes OCaml great for refactoring. When I change a design, the type checker quickly identifies everything I need to review (tools like merlin make it even easier). If thereʼs redundancy, I can factor out a local function. If thatʼs useful, I can make it a global function. Then I can move global functions into a file-local module. Then I can move those modules into separate packages to share with other projects. If modules donʼt feel right, I can bundle functions into records or objects. Or maybe all I need is a higher-order function.

That may sound complex, but all modern languages have these abilities. Itʼs just that OCaml somehow makes them feel comfortable and integrated. I canʼt put my finger on the exact cause, but refactoring OCaml has been a pleasure.

Tools

OCaml tooling leaves a good impression. opam builds everything from source, so if opam install works, Iʼm confident that Iʼll be able to hack on the libraryʼs source code later when I need to fix bugs. Binary package managers are faster, but donʼt offer this assurance. I think this is the right trade off for a programming language package manager.

dune is fast and reliable. It scatters config files all over, but that buys you the ability to put code anywhere in your repository and have it build on the first try. (Iʼve toyed with this as the basis for a very simple opam replacement).

I was skeptical of merlin since I usually find IDE features more distracting than helpful. To my surprise, merlinʼs type hints and error messages (in emacs at least) never intrude on my flow. They only appear when requested. For example, C-c C-t shows the inferred type at the cursor, or C-c C-x shows errors without the cost of doing a full build. imenu even shows inferred types for function definitions. merlin is a helpful companion.

Compiler

OCamlʼs native code compiler is fast enough that I use it almost exclusively. It also produces fast code. If Iʼm iterating quickly, I sometimes use the interpreter or toplevel.

I enjoy #!/usr/bin/env ocaml for small scripts. Itʼs great to have the compiler catch my silly scripting mistakes early. It took me a while to discover this, but you can load modules in an OCaml shell script with #use "topfind" followed by #require "Foo" (but merlin doesnʼt parse those directives yet).

OCaml compiler errors are clear and informative. Read them from the bottom upwards. The bottom has precise information, above is the context. Since terminals scroll off the top, this makes a certain kind of sense, but itʼs different than reading English from top to bottom. In the end the direction doesnʼt seem to matter much.

OCamlʼs backtraces might be the best Iʼve seen. I wish they were on by default, but you can enable them by compiling with ocamlopt -g then adding b to the OCAMLRUNPARAM environment variable. If you set OCAMLRUNPARAM at all, whether b or otherwise, be sure to clear it before running opam. Iʼve had several opam install or opam update fail because it was set.

Anyway, OCamlʼs backtraces are informative without being cluttered. They clearly indicate where an exception originated and where it was re-raised.

Error handling

Iʼm still exploring error handling to find an idiom with tradeoffs that I like. I may write up a detailed article once I settle on something.

Until then, OCaml is a good reminder of how clean and composable code can be when using exceptions for error handling. C and Go treat errors as plain values, so your happy path gets cluttered with boilerplate. Haskell uses error monads, so everyone has to agree on one monad. Exceptions avoid those two problems (while introducing others). There are tons of trade offs in this space, otherwise, all languages would have settled on the same idiom for handling errors.

OCaml can do errors as values, or errors in monads, or errors via exceptions. It can also signal errors with polymorphic variants (feels like lightweight, checked exceptions).

Ideally a clear error handling paradigm for OCaml would have emerged by now, so we could have one idiom throughout the ecosystem. Since that hasnʼt happened, itʼs nice to have choices so that developers can continue exploring the territory.

Community

The community is surprisingly healthy, in the sociological sense. The people are kind, my pull requests get meaningful feedback, discussions cover technical merits without drama. I think that software communities are healthier places than the headlines make them sound, but OCaml has done really well in this area.

Part of the welcoming feel is that the core OCaml code base is easy to navigate. Iʼve had no trouble poking around the standard library or the runtime. The language specification is easy to read. The language grammar is informative.

The community is also remarkably productive. As Iʼve followed commits and pull requests, Iʼve been amazed at how much is getting done by so few people. Thereʼs some amazing talent here. Iʼm sure that Iʼll learn a lot just by watching them at work.

Turn-offs

Relationships with software and with humans differ in at least one major way: the former welcomes patches, the latter shuns them. If you donʼt believe me, try suggesting that your partner get plastic surgery or start a diet ;-)

Yes, I realize that I can contribute to open source projects to fix problems. I do contribute when cost-benefit balance tilts the right way.

Messiness

OCamlʼs ecosystem and culture give me a sense of messiness and disorder. The issue trackers are littered with old, open issues or abandoned pull requests. The standard library has an inconsistent feel. There are a dozen new projects up in the air. Everywhere I look there are too many pieces floating around, and theyʼre all tangled together. To be clear, this is not the case with OCamlʼs core semantics which feel focused, orderly, and orthogonal.

An example may prove useful. Reading OCaml documentation often goes like this: I look up foo x y whose documentation says that itʼs the same as bar 0 x y, whose documentation says that itʼs the same as baz (x+y) 0. After following this chain of pointers, I juggle those pieces in mind while reading about baz, then unwind my mental stack back to foo, which I actually care about.

I realize that, under the hood, code reuses code but that complexity shouldnʼt shine through in APIs, documentation, or abstractions. When I want to understand foo, there are too many pieces in play. This isnʼt just about documentation, the abstractions, opam dependencies, and APIs often have this tangled feel. Dereferencing pointers can be expensive for machines, but it is expensive for humans.

Another example: much of the OCaml ecosystem (like software generally) is too configurable. For example, you can configure the OCaml compiler with at least 11 different options. Thatʼs not eleven total compilers, thatʼs eleven combinatorial options resulting in 2048 possible compilers (are they all viable?). Most opam packages have similarly configurable APIs making them obese and unwieldy for the common case.

Let me back up. We come to relationships with subjective preferences. Those preferences arenʼt right or wrong, they just are. My preference is for few pieces, independence, and throwing things out. I donʼt have the mental bandwidth to handle many pieces, dependencies, or nostalgia. These preferences explain why my favorite software projects are OpenBSD and suckless: they simplify brutally, DIY, and break compatibility if needed.

OCaml culture isnʼt objectively messy, it just feels messy to me. They donʼt delete old functions from the standard library (most languages donʼt), they still support antiquated releases, they use lots of large, third-party libraries. Culturally they seem inclined to create a pointer to somewhere else rather than inlining and optimizing (figuratively of course, the compiler does inlining and optimization).

Iʼd like more Unix philosophy and less kitchen sink.

Documentation

Unlike the OCaml language spec, the module and function documentation in OCamlʼs ecosystem is sparse at best. Its documentation typically describes only a functionʼs purpose (the function name mostly did that already) and exceptions that could arise. It almost never mentions edge cases, corner cases, invariants, pre-conditions, post-conditions, computational complexity, related functions, or best practices. Some opam authors do better, but theyʼre outliers.

I donʼt necessarily want more words. I want more information. Documentation is communication from a library author to her users. Reading the source isnʼt enough. The source only describes how things are currently implemented. It doesnʼt describe what is allowed to happen in the future, or what constitutes a bug.

I have two theories about why OCamlʼs documentation is this way. My first theory concerns types: languages with strong types lean too heavily on them as documentation. Donʼt get me wrong, types are helpful for that purpose, but theyʼre too much like source code: describing the implementation not the interface.

My other theory concerns tools: OCaml documentation is mostly only viewable in a web browser. Browsers are heavy, rarely available/decent on remote servers, and compose poorly. Ocaml has nothing like man, perldoc, pydoc, or go doc for exploring and viewing documentation as text. The popular tools are odig (opens a browser tab), ocp-browser (a curses tool), and merlin (an editor tool, see merlin-document in emacs).

If itʼs a pain to view documentation, nobody will do it; if nobodyʼs viewing documentation, why write it?

My ideal for a documentation tool is one that displays, in a terminal,

a list of installed libraries
a list of modules within a library
a summary of a single module
a list of functions/types in a module
modules/types/functions matching a keyword search
functions matching a type signature
a detailed description of a specific type/function

Wherever possible, results should be displayed one per line so that I can use sort(1), grep(1), awk(1), et al to script shortcuts and workflows.

Conclusion

Thatʼs only two turn-offs and many turn-ons, so clearly my experience with OCaml was positive.

Overall, I was surprised at how well OCaml did in choosing trade-offs and focusing on what matters. I look forward to spending more time with her.