feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics #15354

lakshmansai · 2024-05-14T09:13:38Z

feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics

Summary

Use Case
An in-memory traffic shaper processor which evens out traffic so that output traffic is uniform

We use telegraf as a proxy and we receive data that is spiky in nature, in our every 10 minute we receive a spike and this affects our downstream systems since it needs to also process at the same rate, this leads to wastage of resource since the cpu, memory needs to be provisioned for peaks

Screenshot of spiky behavior before and after using this plugin it is visible that after 1:00 the output rate is steady.

Checklist

[ x] No AI generated code was used in this PR

Related issues

resolves #15353

…tribution of incoming metrics

telegraf-tiger · 2024-05-14T09:13:43Z

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

lakshmansai · 2024-05-14T09:16:27Z

!signed-cla

powersj

Hi,

If we are going to take a processor like this, the messaging to the user needs to be improved, but I still need to talk to the rest of the team if this is something we wish to support as well. I've given some initial comments.

If I send 3 metrics and use your traffic shaper as follows:

[[inputs.exec]]
    commands = [
        "echo metric,host=a value=42",
        "echo metric,host=b value=1",
        "echo metric,host=c value=2",
    ]
    data_format = "influx"

[[processors.traffic_shaper]]
    samples = 1
    buffer_size = 10000

I still see all 3 metrics sent at each interval.

I see time unit is not exposed in the config, which it should be, and defaults to 1 second. If I change this to 10 seconds to match the flush interval, I then see 1, sometimes 2, metrics get produces.

What I don't see is the processors buffer size at any given time. I think this is a major issue as a user would have no way to know or gauge if they are not sending enough metrics at any given time.

Thanks

powersj · 2024-05-15T16:54:21Z

plugins/processors/traffic_shaper/README.md

+
+  ## No of samples to be emitted per time unit, default is seconds
+  ## This should be used in conjunction with number of telegraf instances.
+  samples = 20000


defaults can be commented out.

powersj · 2024-05-15T16:54:39Z

plugins/processors/traffic_shaper/README.md

+
+  ## Buffer Size
+  ## If buffer is full the incoming metrics will be dropped
+  buffer_size = 1000000


Please expose the time unit option as a config.Duration.

done added rate in the config

powersj · 2024-05-15T16:54:53Z

plugins/processors/traffic_shaper/README.md

+output traffic is uniform
+
+Example of uneven traffic distribution
+![traffic_distribution](./docs/traffic_distribution.png)


I would prefer we omit the image.

powersj · 2024-05-15T16:55:07Z

plugins/processors/traffic_shaper/traffic_shaper.go

+	Queue            chan *telegraf.Metric
+	Acc              telegraf.Accumulator


Do these need to be exported?

nope have changed it.

powersj · 2024-05-15T16:56:29Z

plugins/processors/traffic_shaper/traffic_shaper.go

+func (t *TrafficShaper) Stop() {
+	t.Log.Debugf("Got stop signal %s", time.Now().String())
+	close(t.Queue)
+	t.wg.Wait()


This will block telegraf from exiting until all metrics are flushed from the queue? Not sure this is the behavior we want. When someone closes or stops telegraf, things should clean up, but this could block for 100s or 1000s of seconds.

added this as a config so that for users can choose accordingly.

powersj · 2024-05-15T16:57:06Z

plugins/processors/traffic_shaper/traffic_shaper_test.go

+	"github.com/influxdata/telegraf/metric"
+	"github.com/influxdata/telegraf/testutil"
+)
+


Please include a test with tracking metrics. See the other processors for examples.

lakshmansai · 2024-05-16T13:57:04Z

Hi,

If we are going to take a processor like this, the messaging to the user needs to be improved, but I still need to talk to the rest of the team if this is something we wish to support as well. I've given some initial comments.

If I send 3 metrics and use your traffic shaper as follows:
[[inputs.exec]]
    commands = [
        "echo metric,host=a value=42",
        "echo metric,host=b value=1",
        "echo metric,host=c value=2",
    ]
    data_format = "influx"

[[processors.traffic_shaper]]
    samples = 1
    buffer_size = 10000
I still see all 3 metrics sent at each interval.

I see time unit is not exposed in the config, which it should be, and defaults to 1 second. If I change this to 10 seconds to match the flush interval, I then see 1, sometimes 2, metrics get produces.

What I don't see is the processors buffer size at any given time. I think this is a major issue as a user would have no way to know or gauge if they are not sending enough metrics at any given time.

Thanks

Have added rate time interval as config, we have exposed metrics like messagesInFlight for observability.

telegraf-tiger · 2024-05-16T14:32:53Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_arm64.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz	windows_i386.zip
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz

feat(processors): Traffic shaper processor plugin to shape uneven dis…

aa5a0a1

…tribution of incoming metrics

telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label May 14, 2024

readme lint update

e8cea54

powersj requested changes May 15, 2024

View reviewed changes

lakshmansai added 2 commits May 16, 2024 19:31

review comments

7e19b00

review comments

66bc80a

powersj closed this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics #15354

feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics #15354

lakshmansai commented May 14, 2024

telegraf-tiger bot commented May 14, 2024

lakshmansai commented May 14, 2024 •

edited

powersj left a comment

powersj May 15, 2024

lakshmansai May 16, 2024

powersj May 15, 2024

lakshmansai May 16, 2024

powersj May 15, 2024

lakshmansai May 16, 2024

powersj May 15, 2024

lakshmansai May 16, 2024

powersj May 15, 2024

lakshmansai May 16, 2024

powersj May 15, 2024

lakshmansai May 16, 2024

lakshmansai commented May 16, 2024 •

edited

telegraf-tiger bot commented May 16, 2024

Artifact URLs

feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics #15354

feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics #15354

Conversation

lakshmansai commented May 14, 2024

Summary

Checklist

Related issues

telegraf-tiger bot commented May 14, 2024

lakshmansai commented May 14, 2024 • edited

powersj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lakshmansai commented May 16, 2024 • edited

telegraf-tiger bot commented May 16, 2024

Artifact URLs

lakshmansai commented May 14, 2024 •

edited

lakshmansai commented May 16, 2024 •

edited