XYZ: Interaction

Justin Richer
7 min readJun 18, 2020

This article is part of a series about XYZ and how it works, also including articles on Why?, Handles, Compatibility, and Cryptographic Agility.

When OAuth 1 was first invented, it primarily sought to solve the problem of one website talking to another website’s API. It could also be used with native applications, but it was awkward at best due to many of the assumptions built into the OAuth 1 model. To be fair, smartphones didn’t really exist yet, and the handful of clunky desktop apps could more or less be dealt with via a few hacks. In OAuth 2, we at least admitted that mobile applications were now A Thing and created the notion of a public client with no recognizable credentials of its own. But it turned out that wasn’t enough, and we needed additional tools and additional guidance for developers to get mobile applications right. Even later, single-page applications really came into their own, further defying previous assumptions about OAuth clients. Smart devices and the internet of things drove a world of new interaction methods and even new ideas about what logging in means. And all of this is on top of OAuth’s core set of grant types, the differences between which are largely driven by the abilities of different clients.

This kind of flexibility and adaptability is one of OAuth 2’s greatest successes, so what’s the problem? Simply put, OAuth 2 makes a lot of assumptions that the user has access to a web browser which is on the same device as the client. In today’s world of smart clients, rich applications, and constrained devices, these assumptions no longer match reality and there’s a real need for a system with more flexibility. While browsers are certainly still around, we aren’t just connecting two websites to each other anymore, and our security protocols should reflect that.

Photo by Daniel Romero on Unsplash

Getting the User Involved

In OAuth 2, the type of interaction that the client is capable of is defined by the authorization grant that’s in use. This dictates, among other things, how the user gets to the AS and how they get back from the AS to the client. In the authorization code grant, which is at the core of the OAuth 2 protocol design, the user is redirected in both directions through HTTP. The client has to choose exactly one way of dealing with the user before it starts, and this leads to awkward situations like the Device Grant letting the user interactively enter a code at a static URL or having a pre-composed URL that the user just follows. It also means that different grant types need to re-invent parts of the process used in other grant types just so that they can do things a little differently.

But what if we could decompose all of these aspects and get away from a type-based system? In XYZ, instead of choosing a grant type, a client declares all of the different parts of interaction that it supports. The AS then responds with appropriate responses to whatever portions it supports for that request. From there, the client can take action based on whatever portions make sense. After all, the client knows how it can interact with the user. With these tools, XYZ’s interaction model allows for a very rich set of interactions.

In XYZ, we wanted to enable the traditional redirect-based interaction but not depend on it. To do that, the client tells the AS that it can handle redirecting the user to an arbitrary URL, and also signals that it can get the user back on a similar redirect. With this, we’ve got everything we need for an authorization code style interaction.

{
"interact": {
"redirect": true,
"callback": {
"uri": "https://client.foo",
"nonce": "VJLO6A4CAYLBXHTR0KRO"
}
}
}

OK, but what if we’re now talking about a set-top box that can’t pop open a browser? Here we take a clue from the OAuth 2 device grant and say that we could display an arbitrary code, but at a static URL. We expect the user to go there on a secondary device, and so we don’t have a good way to get the user back at the client itself.

{
"interact": {
"user_code": true
}
}

This is great, but what if we could handle either a redirect or a user code? Let’s say, for instance, that we can hand the user a code they can type in, or we can get them to scan a barcode on a secondary device and be sent to an arbitrary page. To handle that, we simply say we can do both.

{
"interact": {
"redirect": true,
"user_code": true
}
}

Notice that our client expects the user to complete the request on a separate device, and so we don’t have the callback parameter included. The AS can decide whether that’s good enough for what’s being asked, or not. An important difference between these protocols is that OAuth 2 starts with a redirect, while XYZ allows the redirect only after the initial back-channel request. This means we can mix things together in novel ways. For example, we could say that we expect the user to enter a code at a fixed URL, but we do want the user to come back to a callback webpage so that our client can continue.

{
"interact": {
"redirect": true,
"callback": {
"uri": "https://client.foo",
"nonce": "VJLO6A4CAYLBXHTR0KRO"
}
}
}

What kind of client would ever need to do this? Maybe we’ve got a smart client on device that has ties to a push notification service with a web-based component, or there’s some other deployment structure that we haven’t thought up yet. Componentizing how the user gets to the AS separately from how the user gets back allows us to mix, match, and stack capabilities as appropriate for our use case.

This setup means we can start to think about other ways to get the user back and securing that return, as well. We need to consider capabilities like sending a response to the client application directly through an HTTP/2 Server Push, or a client being notified through some other signaling mechanism.

Moving Off the Web

Even more importantly, we need to consider interactions with the user outside of the web entirely. While XYZ’s back channel is defined completely over HTTP calls, it doesn’t have to use the front channel (redirects) to interact with the user at all. Here is where the capability-based interaction mechanism starts to get really powerful.

What if the client indicates that it can do a webauthn style signature? The server can offer its challenge to be signed in the initial response, and the client can sign it (by prompting the user to activate their key) and return the results in the follow-up request. If it’s a key the AS already knows, and there isn’t any consent to be gathered, the user doesn’t need to be redirected anywhere.

Or what if the client has access to a digital agent with ties to a non-web distributed data store, like a blockchain or other graph-based data fabric? The client can indicate that it can send and receive messages to this agent, or field queries through it. The agent and the AS might be part of the same data fabric, and therefore not need to go through the client in order to communicate. Such a system could even gather and manage consent entirely outside of a web-based protocol. All the XYZ client would need to care about is how to kick off the transaction and how to get the results.

Or what if the client wants to prompt the user directly for their credentials (cringe)? It’s generally considered bad security practice, but the UX of a login experience that stays completely native to the application is compelling, even today. Be defining an interaction mode specifically for this, the AS can help manage things in a more secure fashion than the OAuth 2 Resource Owner Password Grant. It could allow some kind of clever presentation of the shared secret without exposing it directly, and it could even account for different kinds of MFA.

Or what if the AS can interact with the user through a native application? The client should be able to indicate it can support that without pretending it’s an HTTP redirect.

And if a client could do many different things, then it can set them up in the initial request and let the AS decide what is allowed according to its own policies. We should be able to prompt for a security token and call an on-device agent and redirect out to a webpage if that’s not enough. Since the client doesn’t need to pick a single method ahead of time, it doesn’t even need to discover what is possible in order for it to work. This type of engineering at the protocol level greatly enhances interoperability while simplifying code paths for both client and server.

And finally, what if we don’t need to interact most of the time? OAuth 2 forces the client to try interaction first, but if the AS knows who the client is and who the user is, then it could decide to grant access without ever doing an explicit interaction. We can even do a bootstrapping of complex protocols, such as allowing the client to present an assertion to represent the user but still allowing the AS to interact through a web page (or some other mechanism) if necessary. The user gets involved only when they need to.

Interaction is one of the core extensibility points of XYZ and one of the key places where it breaks from OAuth 2’s past assumptions. The possibilities are nearly endless, and I believe we’re going to see some really exciting advancements in this space.

--

--

Justin Richer

Justin Richer is a security architect and freelance consultant living in the Boston area. To get in touch, contact his company: https://bspk.io/