Choosing an Embedded Language

· text

I have projects written in large, powerful languages (Go, C, OCaml); call them host languages. I want to embed inside these projects a small, constrained language; call it the guest language. The guest language must execute untrusted code without harmful side effects (file access, IO, infinite loops). The guest must be able to call functions provided by the host.

I surveyed a dozen languages. The ones below are the final candidates. After completing this survey, I settled on a different architecture; one that doesnʼt require an embedded language. Iʼm writing down these results in case a similar need arises again.

Starlark

Starlark was created in 2017. It has a pleasant, Python-like syntax. Starlark was designed for running untrusted code. It has a thorough language specification. It has solid implementations in Go, Rust, and Java. Iʼd use the Go implementation.

Between April 2021-2022, the Go implementation had 19 commits from 11 authors. That gives a Herfindahl index of 0.13. Thatʼs a pleasantly diverse group of authors.

Although it isnʼt my subjective favorite, it seems to have learned valuable lessons from the embeddable languages that came before it. That made it the front runner in my objective considerations.

Tcl

Tcl was created in 1988. Itʼs the oldest language on the list, and my subjective favorite. I adore its semantic simplicity, and the way it naturally supports domain-specific language. It has good support for running untrusted code.

It has two solid implentations, both in C: Tcl and jimtcl. Although itʼs not production ready, the Picol implementation deserves an honorable mention. Iʼd probably use jimtcl.

Between April 2021-2022, jimtcl had 36 commits from 4 authors. That gives a Herfindahl index of 0.79. Thatʼs more concentration than Iʼd like. The main Tcl implementation was more active and more diverse (1027 commits, 9 authors, Herfindahl 0.61). Itʼs nice to have two compatible implementations in case one dies.

Lua

Lua is the dominant language in this space. It was created in 1993. Itʼs pleasantly simply and clean. It seems easy to run untrusted Lua code safely. Unfortunately, Iʼve never been able to fall in love with Lua.

It has several solid implementations. Iʼd probably use PUC Lua since itʼs the main line of development.

Between April 2021-2022, PUC Lua had 42 commits from 1 author. That doesnʼt seem right, but thatʼs what the repository says. That would give a Herfindahl index of 1: as bad as it can get.

I thought that maybe LuaJIT would be better. It had 144 commits from 1 author. Itʼs more active, but just as badly concentrated.

Based on its unhealthy levels of concentration and my inability to love it, Lua ranked the lowest in my assessment.

Wren

Wren was created in 2013. Itʼs the youngest of the languages, but the implementation is tiny, clean, and fast. Itʼs not able to run untrusted code, but Iʼm including it here because I think it has a promising future. I donʼt like the Java-esque syntax or object-oriented focus, but every benefit has its price.

Between April 2021-2022, Wren had 16 commits from 7 people. That gives a Herfindahl index of 0.35. Thatʼs an impressive level of diversity for a new, small project. Only Starlark outperformed Wren in diversity (not surprising since Google pays its engineers to work on Starlark).