Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process and go runtime metrics for controller #6966

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mindw
Copy link

@mindw mindw commented Apr 27, 2024

Pull Request Motivation

These metrics are default when using Prometheus go client's pre-defined registry, but were missing as the controller creates it's own registry.

Useful many debug scenarios

Kind

feature

Release Note

Add process and go runtime metrics for controller

@cert-manager-prow cert-manager-prow bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. area/monitoring Indicates a PR or issue relates to monitoring labels Apr 27, 2024
@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign inteon for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 27, 2024
@cert-manager-prow
Copy link
Contributor

Hi @mindw. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cert-manager-prow cert-manager-prow bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 27, 2024
Signed-off-by: Gabi Davar <grizzly.nyo@gmail.com>
@mindw mindw force-pushed the mindw/add_proc_go_build_metrics branch from 9080f16 to c19108f Compare April 27, 2024 11:36
@cert-manager-prow cert-manager-prow bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 27, 2024
@SgtCoDFish
Copy link
Member

/ok-to-test

@cert-manager-prow cert-manager-prow bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 22, 2024
@cert-manager-prow
Copy link
Contributor

@mindw: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cert-manager-master-make-test c19108f link true /test pull-cert-manager-master-make-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wallrj wallrj self-requested a review May 22, 2024 08:32
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mindw

I deployed this and used https://grafana.com/grafana/dashboards/6671-go-processes/ to visualize the resulting metrics:

image

I can see that these metrics will be useful to the cert-manager team for optimizing cert-manager.
I'd be interested to learn how you might use these metrics in your platform.
Maybe you could write a short paragraph describing the user-story.

Do you think there's any risk to adding ~200 new metrics?
Might it cause any extra cost for people who are already monitoring cert-manager using a commercial monitoring service?

@@ -186,10 +188,19 @@ func New(log logr.Logger, c clock.Clock) *Metrics {
)
)

// Create Registry and register the recommended collectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest rewording this comment to explain why rather than what the following code does.
Perhaps:

We want to avoid using and mutating global state so we avoid
prometheus.DefaultRegisterer. Instead we create a local Registry and manually register
the same two collectors that are used by DefaultRegisterer.

Process collector

Exports the current state of process metrics including CPU, memory and file
descriptor usage as well as the process start time. The detailed behavior is
defined by the provided ProcessCollectorOpts. The zero value of
ProcessCollectorOpts creates a collector for the current process with an empty
namespace string and no error reporting.

The collector only works on operating systems with a Linux-style proc
filesystem and on Microsoft Windows. On other operating systems, it will not
collect any metrics.

Go collector

Exports metrics about the current Go process using debug.GCStats (base
metrics) and runtime/metrics.

registry.MustRegister(
collectors.NewGoCollector(
collectors.WithGoCollectorRuntimeMetrics(collectors.MetricsAll),
collectors.WithoutGoCollectorRuntimeMetrics(regexp.MustCompile("^/godebug/.*")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to the code explaining why we omit the godebug metrics.
And give an example of the metrics that will not be available.

OR

Remove that line and include all the Go metrics.

The DefaultRegisterer doesn't omit them.
Nor could I find any examples of other Go processes that omit these.

I tried removing that line and observed ~75 extra lines in the metrics.

$ kubectl get --raw /api/v1/namespaces/cert-manager/services/cert-manager:9402/proxy/metrics | wc -l
395

# Vs
$ kubectl get --raw /api/v1/namespaces/cert-manager/services/cert-manager:9402/proxy/metrics  | wc -l
470

And 25 extra metrics:

$ kubectl get --raw /api/v1/namespaces/cert-manager/services/cert-manager:9402/proxy/metrics  | grep '^go_godebug_'
go_godebug_non_default_behavior_execerrdot_events_total 0
go_godebug_non_default_behavior_gocachehash_events_total 0
go_godebug_non_default_behavior_gocachetest_events_total 0
go_godebug_non_default_behavior_gocacheverify_events_total 0
go_godebug_non_default_behavior_gotypesalias_events_total 0
go_godebug_non_default_behavior_http2client_events_total 0
go_godebug_non_default_behavior_http2server_events_total 0
go_godebug_non_default_behavior_httplaxcontentlength_events_total 0
go_godebug_non_default_behavior_httpmuxgo121_events_total 1
go_godebug_non_default_behavior_installgoroot_events_total 0
go_godebug_non_default_behavior_jstmpllitinterp_events_total 0
go_godebug_non_default_behavior_multipartmaxheaders_events_total 0
go_godebug_non_default_behavior_multipartmaxparts_events_total 0
go_godebug_non_default_behavior_multipathtcp_events_total 0
go_godebug_non_default_behavior_panicnil_events_total 0
go_godebug_non_default_behavior_randautoseed_events_total 0
go_godebug_non_default_behavior_tarinsecurepath_events_total 0
go_godebug_non_default_behavior_tls10server_events_total 0
go_godebug_non_default_behavior_tlsmaxrsasize_events_total 0
go_godebug_non_default_behavior_tlsrsakex_events_total 0
go_godebug_non_default_behavior_tlsunsafeekm_events_total 0
go_godebug_non_default_behavior_x509sha1_events_total 0
go_godebug_non_default_behavior_x509usefallbackroots_events_total 0
go_godebug_non_default_behavior_x509usepolicies_events_total 0
go_godebug_non_default_behavior_zipinsecurepath_events_total 0

$ kubectl get --raw /api/v1/namespaces/cert-manager/services/cert-manager:9402/proxy/metrics  | grep '^go_godebug_' | wc -l
25

@wallrj
Copy link
Member

wallrj commented May 23, 2024

/kind feature

@cert-manager-prow cert-manager-prow bot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 23, 2024
@mindw
Copy link
Author

mindw commented May 24, 2024

@wallrj Thank you for the detailed review and questions!
A bit low on free cycles ATM, hopefully I'll have enough till the end of next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Indicates a PR or issue relates to monitoring dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants