-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross instances local rate limit filter #34230
Comments
I'm assuming this doesn't need anyone pinged for triage since wbpcode filed it and is the person I would ping. :) |
A possible path to reach this target:
If no local cluster is provided, the token share will be 1.0 forever and will not change anything. |
The first question I need to do is "what is the local cluster and how can be used to know the number of replicas?". The number of replicas is the only info can be retrieved or there could be other info sheared between replica? Probably this info could be useful for other filter or custom wasm filter, right? |
local rate limit filter is intended to protect a single instance of a service i.e. how much an instance of envoy can process. I think constantly changing limit based on membership count (during HPA or any scale up/scale down operations) would be very confusing and hard to reason about. Curious why don't you use global rate limiting if you need such a behaviour? |
cc @ramaraochavali Global rate limiting introduce additional dependency (rate limit server, redis) and latency, and may not work properly if it's overload. We also use the local limit in the gateway mode where it's hard to say the local limit is used to protect only one instance of service. And we only enable it for users who know it and require it. So, I believe it won't confuse anyone.
From the other side, I think it's also confusing for users who want a total limit (like in gateway mode) but the total limit will be changed because the HPA or any scale up/down operations. This new feature will provide a new option to let the local rate limit work with a stable total limit of whole Envoy cluster/service. |
local cluster is a special cluster that contains the Envoy instance self. See local cluster name in the https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-clustermanager I have prepared a PR. You can check it if you are interested in that. |
Are you saying the total limit would be changed during HPA by operators based on the number of nodes configured for gateway? |
@ramaraochavali I mean if local rate limit is used, for example, 100 tokens per second is configured, the total limit is 100 * number of Envoy instances. But the instances of Envoy will be changed at runtime because the HPA or something. So, the total limit will also be changed. But in the gateway mode, the users will expect a stable total limit in most cases regardless of the number of the Envoy instances. |
I see. So when a new node comes/node goes down, the current envoy instance limit may go up/down causing few inflight requests to fail because there is another node in the cluster which otherwise would have passed if membership did not change. We have always used local rate limit as a service protection mechanism per envoy instance so trying to understand more about the use case |
I would also be very interested, we are currently exploring ways to implement a suitable approach to rate limiting that is aware of the number of envoy instances but reduces the number of extra dependencies and also calls that are being made during request processing. Implementing a shared local rate limiting approach would be a good approach here. As I understand, a shared token bucket would (/could) also mean that tokens can be used from a different instance in the local cluster? That would mitigate the problem @ramaraochavali mentioned where scaling during requests could fail requests that would otherwise have passed. |
Nope. Envoy cannot actually share data or message with other instances. We can only compute a share/pecentage base on the membership and apply the share to the token buckets. |
Title: Cross instances local rate limit filter
Description:
Local rate limit works more stable and has no additional dependency. It basically is our first choice to do rate limiting.
The only shortage of local rate limit is that the token bucket configuration works independently in different Envoy instances.
This means the replica number of Envoy will effect the final throughout of limiter.
It's not friendly for users who don't know the technology details. And the replica number may be changed dynamically.
But the Envoy actually could know the replica number of it self, by the local cluster.
So, I think it's possible to let all Envoy instances (in same gateway cluster or in same service of mesh) to share a token bucket. Every instance will be pre-allocated part of the bucket quantitative by specific algorithm. (for example, even allocation.) And when the membership of local cluster is changed, we re-execute the algorithm again.
[optional Relevant Links:]
The text was updated successfully, but these errors were encountered: