s3 metrics always increasing #3529

keyolk · 2024-03-28T02:21:46Z

Describe the bug

Its objects and bucket size keep growing and never goes down.

Also I can see about 5GB parquet datas in each block dir.

When I see its log, many lines like the belows from the compactor pods

level=warn ts=2024-03-28T01:49:49.036459838Z caller=compactor.go:248 msg="max size of trace exceeded" tenant=mesg traceId=eddc0f76f1d19e6e898d1f2b60b9c431 discarded_span_count=19697

and some metrics

To Reproduce
Steps to reproduce the behavior:

Start Tempo (SHA or version)

 /tempo -version
tempo, version 2.2.0 (branch: HEAD, revision: cce8df1b6)
  build user:
  build date:
  go version:       go1.20.4
  platform:         linux/arm64
  tags:             unknown

compactor:
  compaction:
    block_retention: 168h
    compacted_block_retention: 1h
    compaction_cycle: 30s
    compaction_window: 1h
    max_block_bytes: 1073741824
    max_compaction_objects: 600000
    max_time_per_tenant: 5m
    retention_concurrency: 10
    v2_in_buffer_bytes: 5242880
    v2_out_buffer_bytes: 20971520
    v2_prefetch_traces_count: 1000
  ring:
    kvstore:
      store: memberlist
...
storage:
  trace:
    backend: s3
    blocklist_poll: 5m
    cache: memcached
    local:
      path: /var/tempo/traces
    memcached:
      consistent_hash: true
      host: o11y-tempo-memcached
      service: memcached-client
      timeout: 500ms
    s3:
      bucket: tempo-apne2
      endpoint: s3.amazonaws.com
      region: ap-northeast-2
    wal:
      path: /var/tempo/wal

Expected behavior

s3 obejcts size should be reduced

Environment:

Infrastructure: EKS
Deployment tool: helm tempo-distributed v1.6.1

Additional Context

The text was updated successfully, but these errors were encountered:

joe-elliott · 2024-03-28T11:57:51Z

Based on your metrics it does seem like Tempo is performing retention, but the bucket size is still growing. If an ingester or compactor exits unexpectedly it will sometimes write a partial block that will then be "invisible" to Tempo.

We recommend setting bucket policies to remove all objects a day or so after your Tempo retention to clean up these objects. I'd recommend a similar policy for multipart uploads which s3 also likes to keep around.

The docs on this are not great. We mention the multipart upload here:

https://grafana.com/docs/tempo/latest/configuration/hosted-storage/s3/#lifecycle-policy

but no real mention of the partial blocks. If this solves your issue, I'd like to turn this into a docs issue to add these details.

github-actions · 2024-05-28T00:04:01Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

github-actions bot added the stale Used for stale issues / PRs label May 28, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s3 metrics always increasing #3529

s3 metrics always increasing #3529

keyolk commented Mar 28, 2024 •

edited

joe-elliott commented Mar 28, 2024

github-actions bot commented May 28, 2024

s3 metrics always increasing #3529

s3 metrics always increasing #3529

Comments

keyolk commented Mar 28, 2024 • edited

joe-elliott commented Mar 28, 2024

github-actions bot commented May 28, 2024

keyolk commented Mar 28, 2024 •

edited