Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 metrics always increasing #3529

Closed
keyolk opened this issue Mar 28, 2024 · 2 comments
Closed

s3 metrics always increasing #3529

keyolk opened this issue Mar 28, 2024 · 2 comments
Labels
stale Used for stale issues / PRs

Comments

@keyolk
Copy link

keyolk commented Mar 28, 2024

Describe the bug

Its objects and bucket size keep growing and never goes down.

image
image

Also I can see about 5GB parquet datas in each block dir.
image

When I see its log, many lines like the belows from the compactor pods

level=warn ts=2024-03-28T01:49:49.036459838Z caller=compactor.go:248 msg="max size of trace exceeded" tenant=mesg traceId=eddc0f76f1d19e6e898d1f2b60b9c431 discarded_span_count=19697

and some metrics
image

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo (SHA or version)
 /tempo -version
tempo, version 2.2.0 (branch: HEAD, revision: cce8df1b6)
  build user:
  build date:
  go version:       go1.20.4
  platform:         linux/arm64
  tags:             unknown
compactor:
  compaction:
    block_retention: 168h
    compacted_block_retention: 1h
    compaction_cycle: 30s
    compaction_window: 1h
    max_block_bytes: 1073741824
    max_compaction_objects: 600000
    max_time_per_tenant: 5m
    retention_concurrency: 10
    v2_in_buffer_bytes: 5242880
    v2_out_buffer_bytes: 20971520
    v2_prefetch_traces_count: 1000
  ring:
    kvstore:
      store: memberlist
...
storage:
  trace:
    backend: s3
    blocklist_poll: 5m
    cache: memcached
    local:
      path: /var/tempo/traces
    memcached:
      consistent_hash: true
      host: o11y-tempo-memcached
      service: memcached-client
      timeout: 500ms
    s3:
      bucket: tempo-apne2
      endpoint: s3.amazonaws.com
      region: ap-northeast-2
    wal:
      path: /var/tempo/wal

Expected behavior

s3 obejcts size should be reduced

Environment:

  • Infrastructure: EKS
  • Deployment tool: helm tempo-distributed v1.6.1

Additional Context

@joe-elliott
Copy link
Member

Based on your metrics it does seem like Tempo is performing retention, but the bucket size is still growing. If an ingester or compactor exits unexpectedly it will sometimes write a partial block that will then be "invisible" to Tempo.

We recommend setting bucket policies to remove all objects a day or so after your Tempo retention to clean up these objects. I'd recommend a similar policy for multipart uploads which s3 also likes to keep around.

The docs on this are not great. We mention the multipart upload here:

https://grafana.com/docs/tempo/latest/configuration/hosted-storage/s3/#lifecycle-policy

but no real mention of the partial blocks. If this solves your issue, I'd like to turn this into a docs issue to add these details.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label May 28, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Used for stale issues / PRs
Projects
None yet
Development

No branches or pull requests

2 participants