Improving performance of ServerSessionMemoryCache #1503

jsha · 2023-09-29T17:08:59Z

Following up on #1200, I wanted to open an issue to talk about possible performance improvements to ServerSessionMemoryCache, particularly ones that improve worst-case performance.

@dhobsd mentioned using the idea of having high and low water marks so we can be more efficient about cleaning up old sessions. Here's a sketch of that:

The current ServerSessionMemoryCache has a map and a queue, maintained at the same size. When the capacity is hit and the oldest item needs to be expired, that's cheap: pop the oldest item off the queue and delete the corresponding entry from the map. Removing a session from the middle of the queue is expensive, though. We look it up in the hashmap (cheap), then have to find it in the middle of the queue (O(n)) and delete it from the queue (O(n)).

One tweak would be to let the queue grow bigger than the map, up to a point. When we remove a session, we remove it from just the hashmap but allow it to linger in the queue. When capacity of the hashmap is hit, we pop items off the queue until we find one that's still in the hashmap, then delete that from the hashmap. That's still O(n) worst case but it's probably pretty cheap most of the time.

Another approach would be to have a queue of hashmaps (VecDeque<HashMap<K, V>>), with each hashmap representing an "epoch". When the last hashmap is full, we pop one off the front (dropping a bunch of old sessions at once), and push a new one onto the back. To look up a session, we look it up in each hashmap in the queue, so we would want to keep the queue short (e.g., around 10). Hashmaps being popped off the front would be partially empty, because many of their sessions would have been redeeemed by then.

The text was updated successfully, but these errors were encountered:

dhobsd · 2023-10-03T13:04:53Z

No matter what caching scheme we use, we have to optimize for worst-case performance. This is because TLS 1.3 doesn't reuse tickets, and any mechanism that allocates more than 1 ticket per new client provides an easy mechanism for malicious clients to fill the cache.

Reduce default amplification. We can send new keys to the client whenever we like, so generating 4 keys per client may not be helpful. A more reasonable default behavior might be sending 1 key, and repopulating this when it's later successfully exchanged.
Make cache access obstruction-free. As I mentioned in #1200, we can optimistically spin try_locks around cache access, and degrade to starting a new session. This is what we have to do for a client that doesn't give us a ticket anyway, so this is no worse than a cache miss.
Configurable eviction strategies. Allow for eviction in FIFO order and random order.
Evict proportionally. If we produce 4 tickets per new client, than each eviction ought to kick out 4 items.
Take on some unsafe and use a doubly-linked list. O(1) random removal is probably worth it in this case. We'd also want to keep a reference to the list position in the hash map here.

In something like HTTP/2, TLS ticket reuse happens over a longer time period than something like SMTPS, where a high volume MX might regularly open new connections to another popular MX with some frequency. This means that we really shouldn't be in the business of deciding whether we ought to evict items by age in either direction.

For pie-in-the-sky ideas, we can partition IP space based on successful ticket reuse, along with a priq ordered by this partitioning, and then use this space to determine whether we should bother generating tickets for a specific client / evict based on lowest reuse rate. The maximum partition size would be configurable, and probably pretty broad for IPv6.

For TLS 1.2, it's probably worth it to use an LRU and have a maximum ticket age. Since TLS 1.2 isn't constrained by reuse in the same way, basically none of the TLS 1.3 issues apply. It's beneficial to split the cache by protocol version for this reason.

dhobsd mentioned this issue Oct 3, 2023

benchmark ServerSessionMemoryCache #1200

Open

zh-jq-b mentioned this issue Oct 17, 2023

rustls server performance doesn't scale well bytedance/g3#125

Closed

zh-jq-b mentioned this issue Jan 3, 2024

Track upstream features that we want to use bytedance/g3#139

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving performance of ServerSessionMemoryCache #1503

Improving performance of ServerSessionMemoryCache #1503

jsha commented Sep 29, 2023

dhobsd commented Oct 3, 2023

Improving performance of ServerSessionMemoryCache #1503

Improving performance of ServerSessionMemoryCache #1503

Comments

jsha commented Sep 29, 2023

dhobsd commented Oct 3, 2023