-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CHORE] adding sample limit to scrape classe #6589
base: main
Are you sure you want to change the base?
[CHORE] adding sample limit to scrape classe #6589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, haven't had the time to a very detailed review... just a quick walkthrough
// SampleLimit defines per-scrape limit on number of scraped samples that will be accepted. | ||
// Only valid in Prometheus versions 2.45.0 and newer. | ||
// | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explain the order in which the configuration is generated, similar to what we do with the other ScrapeClass fields?
We have quite a few options to limits now:
- Limits set in .*Monitor objects
- Limits in ScrapeClass, set in Prometheus
- Enforced Limites, set in Prometheus
The order could get very confusing for beginners 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArthurSens can you check it back please?
I'm worried that it makes it too complicated to understand which limits are applied eventually. IIUC the use case for a scrape class limit is that an admin wants to apply a sane default value when a scrape object doesn't specify a limit and keep the object's limit when defined (even if greater than the default limit). Is this correct? I wonder if such feature shouldn't be delegated to policy engines like Kyverno? |
Here are different use cases for setting sample limits in Prometheus: Different Sample Limits for Different Target Groups:
Soft and Hard Sample Limits:
Comprehensive documentation can help clarify these complexities. Regarding using an additional tool, I believe this change will allow the operator to handle the use case without needing extra tools. However, I may be biased as I'm proposing this change. |
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
1c6c88b
to
fe9d3bd
Compare
|
sorry I don't understand this use case. |
Ok let me try different! The one defined on the service monitor is a soft limit since the service monitor object can increase that value up to the enforcedSampleLimit. The scrape class in this context acts a default value for sample limit in case the service monitor owner didn't define any. Does it looks better? @simonpasquier |
Thanks it clarifies a lot!
Could it be solved if we consider that when both sampleLimit and enforcedSampleLimit are specified, we take the min if the scrape object has no limit itself? Given
WDYT? |
I thought the same, but the enforced sample limit is taking precedence over the sample limit defined on the prometheus limits. I think sampleLimit from Prometheus is being configured as a global sample limit and not on the monitor object level @simonpasquier |
Reading the code again, I think that we could improve the generated config and leverage the fact that the global Going back to my examples:
With a global sampleLimit = 1000 and enforcedSampleLimit = 2000, we should generate:
With a global sampleLimit = 1000 and no enforcedSampleLimit, we should generate:
With no global sampleLimit and enforcedSampleLimit = 2000, we should generate:
|
@simonpasquier this works fine for my use case, and I'll open another PR doing this implementation, but this will not solve the use case where a Prometheus Admin would like to set different default sample limits for different group targets. Do you think this PR is still valid? |
Yeah, I see ScrapeClasses as a great ally for Platform teams that use Prometheus-Operator to offer Prometheus as a Service. What I envision the most is using ScrapeClasses to automatically add security and default relabeling configuration, but also to offer "Scrape Tiers", where consumers of these Prometheus as a Service could choose their appropriate tiers while negotiating budgets with the Platform Team. We have a few examples out there, e.g. Cloudflare establishes basic limits to all scrape configurations and allow teams to manually override them. The problem here is that this approach requires consumers of this API to understand Prometheus' limits and this can easily become a barrier. A much simpler abstraction would be to allow Platform teams to set limits in scrape classes and just offer tiers like:
|
I'm definitely not against adding limits to scrape classes but as stated in the Cloudflare article, global limits would probably work for > 90% users. |
Description
Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue.
When managing sample limits for different targets, scrape class can support default config for different group of targets with different sample limits.
Type of change
What type of changes does your code introduce to the Prometheus operator? Put an
x
in the box that apply.CHANGE
(fix or feature that would cause existing functionality to not work as expected)FEATURE
(non-breaking change which adds functionality)BUGFIX
(non-breaking change which fixes an issue)ENHANCEMENT
(non-breaking change which improves existing functionality)NONE
(if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)Verification
Please check the Prometheus-Operator testing guidelines for recommendations about automated tests.
Changelog entry
Please put a one-line changelog entry below. This will be copied to the changelog file during the release process.