Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication schemes for HTTP/Websockets #5

Open
tailhook opened this issue Feb 27, 2020 · 9 comments
Open

Authentication schemes for HTTP/Websockets #5

tailhook opened this issue Feb 27, 2020 · 9 comments

Comments

@tailhook
Copy link
Contributor

tailhook commented Feb 27, 2020

This adds to RFC 1001 at #4

Overview

Features supported by HTTP:

  1. Authorization header
  2. Cookie
  3. Basic and digest auth by browser (unusable)

Features suported by browser-based WebSockets:

  1. Cookie
  2. Authentication protocol packets
  3. Basic and digest auth by browser (unusable)

Authentication schemes:

  1. OAuth2 -- the most popular one
  2. OAuth1 -- deprecated by OAuth2
  3. SAML -- we probably want in commercial version
  4. LDAP -- doesn't map to the web by itself. Often used to validate username/password (not something we want to do) or to assign permissions by group (currently we're going to implement ACL in edgedb itself)
  5. Kerberos -- usually relies on system libraries providing authentication and not widely used outside of large enterprises and academia

Related protocols:

  1. SCIM -- identity management. Basically a way to create and manage accounts with unified (REST) API. Could potentially replace our CREATE ROLE/ALTER ROLE statements, but out of scope of this research.
  2. WS-Federation -- does look like ecosystem of its own, with lots of standards including authorization

Commercial providers:

  1. Auth0 -- basically provide JWT+OpenID-Connect (OIDC) identity after authentication
  2. Authentiq -- is also a similar OIDC provider
  3. Atlassian Crowd -- looks like uses cookie for the actual authorization
  4. Okta SSO -- supports OIDC, SAML, and whatever they call "Secure Web Authentication"

Related tools:
`. JAAS, Pac4J, Apache Shiro -- java scpecific, not researched closely (but look like just Java interfaces for all other protocols)

Requirements

  1. Same or similar authentication for both HTTP and WebSockets
  2. Scheme should work both in browser and using custom clients
  3. Don't accept login/password or anything directly derived from it, so client doesn't need to keep password in memory for reconnects. And also to avoid handling 2FA. Use external application to verify passwords and multi-factor authentication and only authorize connection in edgedb.

Proposal

Generally authentication should work by providing a Bearer token which is either:

  1. An opaque token, in this case such token should be inserted into the edgedb database by the application beforehand
  2. A Self-Encoded access token, that implements OpenID Connect (OIDC) specification

The downside of (2) is that it's harder to revoke already created token, while the downside of (1) is that edgedb needs to keep track of all the tokens that are active now. Upside of (1) is that it's possible to integrate with more systems (in particular ones doesn't support OIDC, or that support OIDC in the way that is incompatible to edgedb).

The token can be transmitted in the one of three ways (all can be used interchangeably):

  1. Authorization: Bearer <token> -- works for HTTP as well as non-browser websockets
  2. Cookie: <cookie_name>=<token> -- works everywhere, but can be problematic to set a cookie for a domain that is devoted solely to edgedb (we may add a mechanism for that later)
  3. As a param in ClientHandshake, this works on WebSockets only and is needed for browser-based websockets where using Cookie is not apropriate.

    We could use AuthenticationSASL with appropriate mechanism to provide token, but we don't need extra security here (i.e. passing token in the ClientHandshake is at least as good as passing it in the Authorization header, which is an accepted security practice). Keeping less round-trips for authentication is useful.

RFC6750 allows passing access_token as form-encoded body parameter and as URI query parameter. We don't allow that now, but we may consider adding them in future if compelling use cases arise.

Configuration:

  1. Configure cookie_name in the "port" configuration
  2. Any things needed to configure to make ACLs work (to be determined when ACLs implemented)

It's unclear whether we want to allow configuring JWT parameters in particular encryption schema. Also I expect secret keys to be generated and replicated within the edgedb itself, but we can have a mechanism to provide users' keys.

Structure of the Self-Encoded Token

TO DO: research OpenID Connect

Future Extensions

In the future, we should consider at least following ways of authentication:

  1. SAML
  2. TLS Client ceritificates
  3. Kerberos

All of them might only be supported in commercial version.

Update: Note on RFC6750 of access_token usage

@tailhook
Copy link
Contributor Author

tailhook commented Mar 2, 2020

I'm going to postpone self-encoded token support. The reasons are below. But first let's take a look how opaque tokens work.

Opaque Tokens

To authorize token, you insert it into a database with appropriate properties. Something along the lines of:

WITH MODULE auth
  MyToken := INSERT Token {
    token_id := make_token_id(),
    expires := datetime_current() + to_duration(hours := 24),
    database := 'my_database',
    role := 'my_role',
    # any other needed settings
  }
SELECT MyToken { token_id }

Then, you can use the token_id as Bearer token or any equivalent method described above.

Self-Encoded Tokens

The main issue with self-encoding tokens is that currently they are structures like this:

  1. JWT provides a layer to encrypt, sign and verify arbitrary key-value pairs (named Claims)
  2. OpenID Connect provides a set of claims that allow to discover user identity and some other authentication parameters
  3. Additionally OpenID Connect has a way to discover various links to other metadata of user profile. Relying party (edgedb in our case) is then expected to fetch various chunks of additional metadata from external resources.

So generally even at the layer (3) there is not much data relevant for edgedb is fetched. And (2) only provides user name (which is generally an external user name, not edgedb's one when using OAuth).

Postponing Self-Encoded Tokens

So the reasons to only support opaque tokens for now is:

  1. There are too much options on how to do self-encoded tokens, and most current standards are mostly irrelevant
  2. Because of (1) we can't guarantee interoperability with existing systems on the level of using their JWT tokens intact
  3. Auth based on self-encoded tokens have to have much more compatibility guarantees than one based on opaque tokens.
  4. Token checks are not in the hot path for WebSockets. While they are in HTTP implementation various caching approaches can be implemented to alleviate the performance issue.

So the current proposal is to implement opaque tokens only, and postpone self-encoding tokens to the time when both will be true:

  • ACLs are implemented
  • We have more experience with how token-based authentication is general

@elprans
Copy link
Member

elprans commented Mar 3, 2020

Great summary, thanks @tailhook!

The issue with a non-self-encoded token, as you pointed out, is that we will have to store token metadata somewhere. Storing it in a database begs a question: which database? We currently avoid having a "special" database to store global metadata and instead rely on metadata in Postgres shared catalogs (pg_database and pg_roles). This arrangement makes maintaining large quantities of user-associated metadata, such as a list of valid tokens, quite cumbersome, especially where expiring tokens are considered.

I think we should take a closer look at using JWT as the token protocol from the get-go using EdgeDB-specific claims (probably just the name of the database role for now). This would make it easier to add support for OIDC later as well.

@tailhook
Copy link
Contributor Author

tailhook commented Mar 4, 2020

Okay, if we don't care about compatibility with anything, we can use JWT for encoding our own things, but...

To revoke a token we have to store some token metadata. This is generally a lot less actual storage, but structurally it's the same. I don't believe we can get to production without any way of revoking tokens.

We currently avoid having a "special" database to store global metadata and instead rely on metadata in Postgres

Do you think this will continue to be true when we have ACLs?

@elprans
Copy link
Member

elprans commented Mar 4, 2020

To revoke a token we have to store some token metadata.

You only need to keep a set of revoked token ids. Revocation is also a relatively rare event, so the set will not be large, which makes shared catalog storage feasible.

Do you think this will continue to be true when we have ACLs?

Yes. The authorization scopes will be encoded as claims in JWT.

@tailhook
Copy link
Contributor Author

tailhook commented Mar 4, 2020

Yes. The authorization scopes will be encoded as claims in JWT.

I'm not asking about scopes. I'm about the actual access control lists, rules, whatever. I expect quite a bit of metadata about relations between users and data. I expect them to be stored somewhere.

@elprans
Copy link
Member

elprans commented Mar 4, 2020

Oh. The actual access rules will be defined in the schema with DDL/SDL: https://edgedb.com/roadmap/#access_control. The scopes in the token will effectively populate globals in a session, which, in turn, will trigger relevant access rules.

@tailhook
Copy link
Contributor Author

tailhook commented Mar 5, 2020

Well, so ACLs will depend on the database. We can do the same with tokens, since the current spec declares a database in the URL wss://host.name/ws/database_name, we can look for the tokens in the database itself, rather than using a "special" database.

@tailhook
Copy link
Contributor Author

tailhook commented Sep 1, 2020

@1st1
Copy link
Member

1st1 commented Sep 1, 2020

To Do: take a look at PASETO:

That's a good one, thanks for sharing

@tailhook tailhook mentioned this issue Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants