XYZ: Handles, Passing by Reference, and Polymorphic JSON

Justin Richer
8 min readJun 8, 2020

--

This article is part of a series about XYZ and how it works, also including articles on Why?, Interaction, Compatibility, and Cryptographic Agility.

One comment I’ve gotten from several people when reading the XYZ spec text and surrounding documentation is about one of its core innovations. Namely, what’s with all the handles everywhere?

What is a Handle?

The XYZ protocol works by passing JSON objects around that represent different parts of the request and response. These parts include the keys the client is using to protect the request, the resources the client is asking for, the user information the client wants or knows about, the display information to be used in an interaction, and other stuff. Handles are, in short, a way to pass some of the request objects in as a reference value instead of as an object.

First a quick aside: why the name “handle” for this bit? In all honesty, it was the best I could come up with that wasn’t already massively overloaded within the space. Artifact, token, aspect, reference, etc … all arguably better than handle but they could be confusing. I hope that the final version of this protocol can bikeshed a better name for this concept, but for now they’re handles.

Let’s take keys for an example. In XYZ, the client identifies itself using its key. It can always pass its public key by value:

{
"keys": {
"proof": "jwsd",
"jwks": {
"keys": [
{
"kty": "RSA",
"e": "AQAB",
"kid": "xyz-1",
"alg": "RS256",
"n": "kOB5rR4Jv0GMeLaY6_It_r3ORwdf8ci_JtffXyaSx8xYJCCNaOKNJn_Oz0YhdHbXTeWO5AoyspDWJbN5w_7bdWDxgpD-y6jnD1u9YhBOCWObNPFvpkTM8LC7SdXGRKx2k8Me2r_GssYlyRpqvpBlY5-ejCywKRBfctRcnhTTGNztbbDBUyDSWmFMVCHe5mXT4cL0BwrZC6S-uu-LAx06aKwQOPwYOGOslK8WPm1yGdkaA1uF_FpS6LS63WYPHi_Ap2B7_8Wbw4ttzbMS_doJvuDagW8A1Ip3fXFAHtRAcKw7rdI4_Xln66hJxFekpdfWdiPQddQ6Y1cK2U3obvUg7w"
}
]
}
}
}

This object contains the keying material that will uniquely identify the piece of software making this request, as well as the proofing mechanism that software intends to use with that key. This is especially useful for clients that generate their keys locally: they can show up with a new key and just start working, as long as the server’s policy is OK with that kind of dynamic client. But in the OAuth 2 world, most of the clients are pre-registered, and it would seem silly and wasteful to force all developers to push all of their keying information in every request. So what the client can do instead is present a key handle in lieu of the key itself.

{
"keys": "7C7C4AZ9KHRS6X63AJAO"
}

This handle stands in for the object shown in full above. Upon seeing this handle, the server looks up the key and proofing method that the handle refers to. Instead of passing the key by value, the client has effectively passed the key by reference using the handle construct. Since we’re talking about keying material, it’s important to note that we’re not just talking about a fingerprint or thumbprint here. The value of the key may not have any direct relationship to the value of the key itself, just so long as the AS knows what key it stands for. The key that the key handle points to can even change over time. In addition, the XYZ protocol already has some concessions for passing fingerprints and thumbprints instead of full keys, like this one for a TLS certificate.

{
"keys": {
"proof": "mtls",
"cert#S256": "bwcK0esc3ACC3DB2Y5_lESsXE8o9ltc05O89jdN-dg2"
}
}

From the XYZ protocol perspective, this is still passing the key by value, as it contains enough cryptographic information to identify the key and allow the AS to validate the incoming connection’s signatures. A handle can still be used instead of this hashed certificate value.

But now you might be thinking, where did the client get that key handle in the first place?

Static and Dynamic Handles

As long as the AS understands what the handle stands for and the client knows how to use it, the XYZ protocol doesn’t actually care how the parties learn of this handle. However, judging from experience with OAuth 2 and in particular dynamic registration, clients will probably be getting handles in one of a couple ways.

Static Handles are assigned to represent specific objects at some point before the transaction request. For a key handle, it’s probable that a developer filled in some information into an online registration form, including their public key. The AS generates a key handle that the developer can plug into their software and get started, or the developer could use their key directly. In both cases, the client is identified by its keys.

Dynamic Handles are assigned to represent specific objects in response to a transaction request. If the client presents an object by value in the request, the AS can, at runtime, generate a new handle and return it to the client software. For a key handle, the client can now send the handle in future requests instead of repeating its keying information. This is a huge boon for mobile applications, as they would be allowed to generate their keys locally and present them directly the first time they talked to a server, but then switch to the simpler and smaller key handle for subsequent requests with the same server. This gives us the benefits of dynamic client registration without a separate registration step being required.

You’ll notice that in both cases, the handle always translates from the AS to a specific object. The AS is fully in charge of the mapping between the handle and the object it represents, and so that means the AS can define and change that mapping as it needs to. For example, the AS can allow a developer to update a registered client’s public key without changing the key handle that the client sends. Or instead of one handle representing a single key, the handle could represent a family of certificates signed by a common root. The clients in question can always send the key itself or the handle they’ve been configured with.

What Do Handles Stand For?

In the current XYZ protocol, different parts of the incoming request can be represented by handles. Each of these handles is a little different, as is the information that they represent, but all use the same exact mechanism of a handle value standing in for an object that would otherwise be passed by value. As we’ve covered key handles in our examples above, let’s look at the other parts.

Resource Handles are simple strings that stand in for one of the potentially-complex objects that the client can use to specify what it wants to access. These strings are usually going to be defined by the API’s being protected, and they stand for common access patterns available at the API. In other words, they are equivalent to OAuth 2’s scope parameter. And in XYZ, the client can simultaneously request both handle and non-handle resources, allowing for rich authorization requests without mode switching.

Claims Handles are similar to resource handles in that they stand in for a set of user claims that the client is requesting.

Display Handles stand in for information that can be displayed to the user during an authorization request. These are most useful for dynamic and ephemeral clients, where the client will present all of this once and not need to again. These should be tied to a particular key in order to prevent impersonation of a client. Here we start to get close to the functionality of OAuth 2’s client_id, but dynamic registration and per-instance information of OAuth 2 clients as used in the wild has shown us that it’s valuable for them to vary independently. But for cases where it doesn’t need to vary, you can in fact use a client_id as the key handle and display handle for a hybrid OAuth 2 and XYZ system.

User Handles can be used with trusted client software to allow that client to assert, to the AS, that the same user is still there as they were previously. This can allow the AS to decide if it wants to shortcut the interaction process — after all, if it’s the same user and the client isn’t asking for something that needs additional approval, then just let them in. This is based on the Persisted Claims Token concept from UMA2. Unlike the claims, this represents what the client already knows about the user, not what it’s asking for.

Transaction Handles allow a client to refer to a whole ongoing transaction in the future. For example, while the client is polling, or when it receives the front channel response, or when it needs to respond to a challenge from the AS, the transaction handle allows the client to refer to the ongoing transaction object. It can also be used to do step-up and refresh authorization after an initial transaction has been completed. Unlike the other handles, this does not represent a single part of the request object, but rather the overall state. Therefore it’s not quite a handle in the same way, but it does at least represent something that the client is referring to. I think it’s likely that this needs a better, different name, but for the moment it’s a different kind of handle.

How Can The Protocol Express All This?

XYZ’s protocol is not just based on JSON, but it’s based on a particular technique known as polymorphic JSON. In this, a single field could have a different data type in different circumstances. For example, the resources field can be an array when requesting a single access token, or it can be an object when requesting multiple named access tokens. The difference in type indicates a difference in processing from the server. Within the resources array, each element can be either an object, representing a rich resource request description, or a string, representing a resource handle (scope). Importantly, the array can contain both of these at the same time — and it works semantically because they ultimately both refer to the same thing. The same construct is used for all of the different handles above. As a consequence, it’s clear to a developer that the handle is a replacement for, or a reference to, the object value.

While it might not be a mode that developers working in statically typed languages are used to, we have implemented this in a number of different platforms, including the famously strict Java.

In a more statically typed JSON protocol, these would need to be two different fields, each with its own type information. The fields would need to be defined as mutually-exclusive at the protocol level, like jwks and jwks_uri are for OAuth clients. This type of external interlock leads to consistency errors and complex verification schemes, especially as values change. By building the exclusivity into the protocol syntax, we can bypass all of that error-prone interlock checking. The resulting protocol definition is more clean, more concise, and more consistent.

--

--

Justin Richer

Justin Richer is a security architect and freelance consultant living in the Boston area. To get in touch, contact his company: https://bspk.io/