Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance of ServerSessionMemoryCache #1503

Open
jsha opened this issue Sep 29, 2023 · 1 comment
Open

Improving performance of ServerSessionMemoryCache #1503

jsha opened this issue Sep 29, 2023 · 1 comment

Comments

@jsha
Copy link
Contributor

jsha commented Sep 29, 2023

Following up on #1200, I wanted to open an issue to talk about possible performance improvements to ServerSessionMemoryCache, particularly ones that improve worst-case performance.

@dhobsd mentioned using the idea of having high and low water marks so we can be more efficient about cleaning up old sessions. Here's a sketch of that:

The current ServerSessionMemoryCache has a map and a queue, maintained at the same size. When the capacity is hit and the oldest item needs to be expired, that's cheap: pop the oldest item off the queue and delete the corresponding entry from the map. Removing a session from the middle of the queue is expensive, though. We look it up in the hashmap (cheap), then have to find it in the middle of the queue (O(n)) and delete it from the queue (O(n)).

One tweak would be to let the queue grow bigger than the map, up to a point. When we remove a session, we remove it from just the hashmap but allow it to linger in the queue. When capacity of the hashmap is hit, we pop items off the queue until we find one that's still in the hashmap, then delete that from the hashmap. That's still O(n) worst case but it's probably pretty cheap most of the time.

Another approach would be to have a queue of hashmaps (VecDeque<HashMap<K, V>>), with each hashmap representing an "epoch". When the last hashmap is full, we pop one off the front (dropping a bunch of old sessions at once), and push a new one onto the back. To look up a session, we look it up in each hashmap in the queue, so we would want to keep the queue short (e.g., around 10). Hashmaps being popped off the front would be partially empty, because many of their sessions would have been redeeemed by then.

@dhobsd
Copy link

dhobsd commented Oct 3, 2023

No matter what caching scheme we use, we have to optimize for worst-case performance. This is because TLS 1.3 doesn't reuse tickets, and any mechanism that allocates more than 1 ticket per new client provides an easy mechanism for malicious clients to fill the cache.

  1. Reduce default amplification. We can send new keys to the client whenever we like, so generating 4 keys per client may not be helpful. A more reasonable default behavior might be sending 1 key, and repopulating this when it's later successfully exchanged.
  2. Make cache access obstruction-free. As I mentioned in #1200, we can optimistically spin try_locks around cache access, and degrade to starting a new session. This is what we have to do for a client that doesn't give us a ticket anyway, so this is no worse than a cache miss.
  3. Configurable eviction strategies. Allow for eviction in FIFO order and random order.
  4. Evict proportionally. If we produce 4 tickets per new client, than each eviction ought to kick out 4 items.
  5. Take on some unsafe and use a doubly-linked list. O(1) random removal is probably worth it in this case. We'd also want to keep a reference to the list position in the hash map here.

In something like HTTP/2, TLS ticket reuse happens over a longer time period than something like SMTPS, where a high volume MX might regularly open new connections to another popular MX with some frequency. This means that we really shouldn't be in the business of deciding whether we ought to evict items by age in either direction.

For pie-in-the-sky ideas, we can partition IP space based on successful ticket reuse, along with a priq ordered by this partitioning, and then use this space to determine whether we should bother generating tickets for a specific client / evict based on lowest reuse rate. The maximum partition size would be configurable, and probably pretty broad for IPv6.

For TLS 1.2, it's probably worth it to use an LRU and have a maximum ticket age. Since TLS 1.2 isn't constrained by reuse in the same way, basically none of the TLS 1.3 issues apply. It's beneficial to split the cache by protocol version for this reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants