Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement (cache): Add log or metric for missing SOA on negative response #6683

Open
gcs278 opened this issue May 16, 2024 · 1 comment
Open

Comments

@gcs278
Copy link
Contributor

gcs278 commented May 16, 2024

What would you like to be added:

It's known that the cache plugin doesn't store negative (NXDOMAIN) responses as it complies with https://tools.ietf.org/html/rfc2308#section-5:

   Negative responses without SOA records SHOULD NOT be cached as there
   is no way to prevent the negative responses looping forever between a
   pair of servers even with a short TTL.

However, like in #3755, we have users that have upstreams servers that are not sending
compliant NXDOMAIN responses with an SOA (https://datatracker.ietf.org/doc/html/rfc2308#section-3). The DNS load on the upstream servers is significantly increased due to CoreDNS not caching these requests.

Unlike the solution presented in #3755 which enables caching of NXDOMAIN responses with no SOA, I'm curious if the community would be open to adding a log message and/or metric that would create better visibility for this problematic and non-compliant situation.

As for log message vs metric: a log message at a minimum would be nice, but a metric of some sort (maybe coredns_forward_negative_response_missing_soa_total) would be even better, as it would allow our platform to create alerts on missing SOAs.

Why is this needed:

The motivation for a log message or metric is to encourage users to:

  1. Provide better visibility into this non-compliant situation which can result in overloading upstream DNS servers
  2. Encourage users to pursue fixing a non-compliant upstream server NXDOMAIN response.

I'm happy to create a PR with the log and/or metric provided there is some agreement whether a log and/or a metric is an appropriate solution. I am curious if there is any precedent for logging non-compliant scenarios like this.

@SuperQ
Copy link
Collaborator

SuperQ commented May 17, 2024

Adding a metric seems like a good idea to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants