sq JSON: first sketch

By Lars | March 2, 2022

Would you like to use Sequoia sq from your script? We’d like your feedback.

I’m sketching what the JSON output of sq might look like. We in the Sequoia project would like to make sure the JSON serves you well and is convenient for your code to consume. This blog post outlines the principles of how JSON output is meant to work, and has a concrete example of what it’s meant to look like. Your feedback would very much be appreciated.

Don’t break consumers

The Linux kernel has the guiding principle of “don’t break userland”. If the kernel changes how it behaves in some circumstance, and software running on top of the kernel breaks, the kernel is at fault, even if the old kernel behavior was buggy.

The Sequoia command line tool sq will produce JSON output. Other software will consume it. This makes the JSON output an application programming interface (or API), and as such it needs an interface contract so that consumers will know what they can rely on not to change in ways that break them.

I’m proposing the following principles for the JSON API of sq:

  • The JSON output is always a JSON object, not a list or a scalar.
    • However, see below about line-oriented JSON.
  • There is always a field sq_schema that specifies the schema version of the JSON output, as a list of three integers that specify components of a SemVer compatible version number. We’ll update the component as follows:
    • patch: incremented if there are no semantic changes
    • minor: one or more fields were added
    • major: one or more fields were dropped
  • We won’t ever change the meaning of a field in a given type of JSON object, regardless of whether it’s the outermost one or nested inside a field of an outer object: we will always rename the field instead.
    • the outermost JSON object might have a field packets, whose value is a list of JSON objects, which have a field packet_type; the outermost JSON object is one type (“sq packet dump output”), the inner object is another type (“a packet”)
    • although it’s not evident in the JSON output, each type of JSON object will be represented in the sq source code by a Rust data structure; knowing this may help thinking about types
  • We won’t re-use the name of a field in a given type of JSON object.
  • Consumers shall ignore any fields they don’t use.

This approach should allow us to evolve the schema for the JSON output. Later, when we add other formats, such as YAML, we can use the same approach.

In other words, if a consumer wants a field droids, and the sq output contains a fields called droids, then the consumer can be sure those are the droids they are looking for. However, the consumer can also look at the schema version to know which fields it should expect. The consumer can take the approach that’s easier for them.

The user will be able to choose what version of the schema to output: we will keep support in sq for every major version the latest minor and patch versions. sq will support 1.2.3, but not 1.0.0, 1.1.0, 1.2.0, 1.2.1, or 1.2.2: a consumer who understands 1.2.0, will understand 1.2.3. Likewise, a consumer who understands 1.2.0 will also understand 1.3.0, as long as they ignore any fields added after 1.2.0.

The compatibility rules mean that we can add fields without breaking consumers, so there’s no need to support all minor versions of a major version. Thus, if schema version 1.0.0 has a field name, and we add a field nickname, we bump the version to 1.1.0. If we rename nickname to petname, we bump the version to 2.0.0. If we then want the petname field to be a list of pet names, we drop the petname field, add the petnames field, and bump the version to 3.0.0.

Patch level changes would be changes such as adding constraints on fields, without otherwise changing the semantics: if version 1.0.0 has a field name that is a string, and it just so happens sq never sets it to an empty string, version 1.0.1 might add the explicit constraint that name is never empty. Patch level changes must never break compatibility for consumers of sq JSON output.

sq will add an option --output-format=FORMAT, where FORMAT is json for now, but will allow for other values later. This will be a global option, i.e., not specific to a subcommand. If JSON output is requested, but the subcommand doesn’t support JSON output, the subcommand will just output what it normally outputs.

sq will also add the option --output-version=VERSION, where version is a string with dotted integers (1, 1.2, or 1.2.3), and sq will output that schema version, if it knows it. If VERSION is 1.2.3, but sq only knows 1.2.2, that’s an error, and sq won’t output anything. If no --output-version is used, sq will output the latest version it supports.

The environment variables SQ_OUTPUT_FORMAT and SQ_OUTPUT_VERSION will be used if the corresponding options aren’t given by the user. This allows a consumer to avoid having to add the options to every invocation of sq.

sq keyring list

The first command to gain JSON support will be sq keyring list. The output will look something like this:

{
  "sq_schema": [
    0,
    0,
    0
  ],
  "keys": [
    {
      "fingerprint": "16F3A23A820810ABA1ADEBBE9B75D81B3D06E8DD",
      "primary_userid": "Lars Wirzenius (obnam backups) <liw@liw.fi>",
      "userids": [
        "Lars Wirzenius (obnam backups) <liw@liw.fi>"
      ]
    }
  ]
}

The keys field will be an empty list, if the input doesn’t contain keys or certificates, respectively. The primary_userid field is chosen by sq. The userids field always contains all User IDs, including the primary one.

Note that while sq keyring list has the option --all-userids, that has no effect on JSON output. The textual output is meant for humans, who find it easier to only see the primary user id if that’s what they care about. The JSON output is for programs, which don’t mind ignoring fields.

sq inspect, sq packet dump

Later on, sq will get JSON support for things like sq inspect and sq packet dump. However, having experimented with adding JSON support for those, I know it will require a fair bit of internal infrastructure changes to be doable cleanly. I’d rather start small, with scaffolding to support JSON at all.

Single JSON object vs line based JSON

For cases when sq output is very large, writing only a single JSON object can be wasteful. The consumer needs to use a special streaming parser to avoid having to construct the whole object in memory. For memory-constrained consumers this can be a serious problem.

An alternative is to use a line based JSON approach: each output line contains one JSON object. See JSON Lines and its fork ndjson for details.

How large a JSON object should we allow? Should we always use line based JSON, or only when the user requests it? Would line based JSON be significantly harder to consume? Your opinion would be welcome.

See the #734 issue for a discussion about this, and other aspects.

Questions

If we assume you would want to write a script or program to consume the output of sq, would the approach I’ve outlined above work for you? Would you find it convenient to use? If you use a tool such as jq, would you find it convenient to consume the sample output above? Do you expect it to be easy to get things right the first time, or would it be error prone?

Which sq commands would you most like to have JSON support?

You can email me directly (liw@sequoia-pgp.org), drop by on the Sequoia IRC (#sequoia on OFTC), or leave a comment on the Sequoia issue tracker (issue #734).

Note

This work is supported by a grant from the NLnet foundation from the NGI Assure fund, financially supported by the European Council.