Some thoughts on TOML
I recently read some things that made me think about configuration files, namely TOML.
Comparison to INI
The first text was An INI critique of TOML which compares TOML (quite unfavourably) to INI. As a quick reminder, INI is a family of formats that have a set of sections with key-value pairs, like this:
[server1]
hostname = foo
cores = 16
online = true
tags = linux, europe
[server2]
hostname = bar
cores = 8
online = false
tags = bsd, asia
I think it is fair to say that TOML is a rather distant relative in the INI family. The main critique in the article seems to be that TOML has types, similar to JSON. With INI, it is typically the application and not the configuration file that says how a value should be interpreted. So false
could be interpreted as a boolean, but also as a string, or a list with a single element. Based on that difference, the author has multiple complaints about TOML:
- Users must know the correct types for values
- Users must use quotes around strings and square brackets around lists
- Date-related types are bad for some reason
- The application still has to interpret values for types that are not covered by this simple system, e.g. enums
While I understand the critique, I don't think it is all that strong. I believe it is actually a good thing that users can understand the type of values from reading the configuration file. I don't really mind the quotes, square brackets, and dates. And my experience with JSON has taught me that a few simple types can go a long way. On the other hand, my experience with INI has taught me that having to call the correct typed getter on each use of a config value can become tedious.
The proper way to handle this is to validate the configuration before using it. If any values have the wrong type (TOML) or cannot be interpreted as the correct type (INI), the application should exit with a descriptive error message.
Overwrites
The second text was the UAPI Group Configuration Files Specification. This one is less about the format of configuration files themselves, but about their location in a Linux system. Crucially, it defines the concept of drop-ins, these *.d
folders that can contain snippets of configuration that are combined together.
Combining multiple configuration files is important in two situations:
- You want to overwrite some specific values and otherwise use the defaults provided by the distro or vendor
- Other packages should be able to add their own config, e.g. crontabs or apparmor profiles
The concept of drop-ins is well established. I am not convinced that it should be required for every single configuration file, but a lot of projects would benefit from it. So I was a bit surprised when I learned that TOML does not allow overwriting values. (I was also surprised that this limitation is not even mentioned in the INI article.) TOML is compatible with the second use case (adding new sections), but not with the first. And that is not going to change.
Why is that and can it be fixed?
Hierarchy
INI doesn't have much of a hierarchy. There are sections, keys, and values, and that's it. TOML on the other hand interprets dots in sections and keys as additional levels of hierarchy:
[servers.foo]
cores = 16
status.online = true
status.has_errors = false
tags = ["linux", "europe"]
This also means that sections are basically just common prefixes and can be avoided entirely:
servers.foo.cores = 16
servers.foo.status.online = true
servers.foo.status.has_errors = false
servers.foo.tags = ["linux", "europe"]
With this structure, overwriting would be simple: Later values simply overwrite earlier ones. However, the big issue are lists. How would you add, remove, or modify individual list items? For simple lists it might be ok to just replace them completely. But TOML provides not one but two ways to nest tables inside of lists:
# inline tables
servers = [
{hostname = "foo", cores = 16},
{hostname = "bar", cores = 8},
]
# array of tables
[[servers]]
hostname = "foo"
cores = 16
[[servers]]
hostname = "bar"
cores = 8
Conclusion
Currently, we have the option to either not use TOML, not use overwrites, or merge the config after parsing (which might potentially lead to multiple, incompatible implementations).
I wonder if there is room in the world for a variant of TOML that allows to overwrite values and bans (or at least discourages) the use of tables inside of lists.