Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received timer metrics are not processed correctly into whisper file sometimes #721

Open
pavel-kolla-kampiki opened this issue May 27, 2021 · 0 comments

Comments

@pavel-kolla-kampiki
Copy link

While debugging a missing metrics issue with help of tcpdump i arrived at a situation that appears to be derived from statsd behavior.

We use https://hub.docker.com/r/graphiteapp/graphite-statsd/ docker image in GCP for various service monitoring, with following configuration:

# cat /opt/statsd/config/udp.js
{
  "graphiteHost": "127.0.0.1",
  "graphitePort": 2003,
  "port": 8125,
  "flushInterval": 10000,
  "servers": [
    { server: "./servers/udp", address: "0.0.0.0", port: 8125 }
  ],
  "deleteIdleStats": true,
  "deleteTimers": false,
  "deleteGauges": false,
  "percentThreshold": [90, -90]
}

deleteTimers/deleteGauges is a recent addition to highlight this specific issue as it was not clear on what grounds target whisper files had 'None' value for some buckets before.
Using tcpdump on active production container I was able to capture and filter incoming UDP packets and correlate timestamps with records in resulting whisper files this way:

uts             count.wsp       sum.wsp    tcpdump
1622103180  	0.000000  	None
1622103190  	1.000000  	7.032493   1622103187.033346 (id 15967): [udp sum ok] aggregation.earp.refresh.current.runtime:7.032493|ms
1622103200  	0.000000  	None
1622103210  	0.000000  	None
1622103220  	0.000000  	None
1622103230  	0.000000  	None
1622103240  	0.000000  	None
1622103250  	1.000000  	7.088933   1622103247.089490 (id 27306): [udp sum ok] aggregation.earp.refresh.current.runtime:7.088933|ms
1622103260  	0.000000  	None
1622103270  	0.000000  	None
1622103280  	0.000000  	None
1622103290  	0.000000  	None
1622103300  	0.000000  	None
1622103310  	1.000000  	7.007746   1622103307.007718 (id 36630): [udp sum ok] aggregation.earp.refresh.current.runtime:7.007746|ms
1622103320  	0.000000  	None
1622103330  	0.000000  	None
1622103340  	0.000000  	None
1622103350  	0.000000  	None
1622103360  	0.000000  	None
1622103370  	0.000000  	None       1622103366.994994 (id 45097): [udp sum ok] aggregation.earp.refresh.current.runtime:6.992849|ms
1622103380  	0.000000  	None
1622103390  	0.000000  	None
1622103400  	0.000000  	None
1622103410  	0.000000  	None
1622103420  	0.000000  	None
1622103430  	1.000000  	6.996750   1622103426.997761 (id 57417): [udp sum ok] aggregation.earp.refresh.current.runtime:6.996750|ms
1622103440  	0.000000  	None
1622103450  	0.000000  	None
1622103460  	0.000000  	None
1622103470  	0.000000  	None
1622103480  	0.000000  	None
1622103490  	1.000000  	6.936743   1622103486.939107 (id 64814): [udp sum ok] aggregation.earp.refresh.current.runtime:6.936743|ms
1622103500  	0.000000  	None
1622103510  	0.000000  	None
1622103520  	0.000000  	None
1622103530  	0.000000  	None
1622103540  	0.000000  	None
1622103550  	1.000000  	7.033949   1622103547.039855 (id 10163): [udp sum ok] aggregation.earp.refresh.current.runtime:7.033949|ms

Seeing 0 for 1622103370 bucket, while there was a metric received at 1622103366.994994 is incorrect. I assume that 0 is the value which was fed into graphite by statsd because prior to setting deleteTimers to false there were None values for these 'missing' buckets.

tcpdump output was captured as tcpdump -tt -A -s0 -vv dst port 8125 | grep earp.refresh.current.runtime -B5 and cut for brevity, full output for 'missing' metrics is:

1622103366.994994 IP (tos 0x0, ttl 62, id 45097, offset 0, flags [DF], proto UDP (17), length 80)
    10.52.16.11.54827 > graphite.8125: [udp sum ok] UDP, length 52
E..P.)@.>.P.
4..
4...+...<..aggregation.earp.refresh.current.runtime:6.992849|ms

whisper files rendered as /opt/graphite/bin/whisper-fetch.py --from=1622103000 /opt/graphite/storage/ whisper/stats/timers/aggregation/earp/refresh/current/runtime/count.wsp
and /opt/graphite/bin/whisper-fetch.py --from=1622103000 /opt/graphite/storage/ whisper/stats/timers/aggregation/earp/refresh/current/runtime/sum.wsp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant