Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent v3.7.0 gradually "loses" log files #441

Open
andrew-barnett opened this issue Nov 3, 2022 · 2 comments
Open

Agent v3.7.0 gradually "loses" log files #441

andrew-barnett opened this issue Nov 3, 2022 · 2 comments

Comments

@andrew-barnett
Copy link

We are running logdna-agent v3.7.0 on a k8s cluster with 3 nodes. As the agent process runs it gradually loses track of log files and eventually stops exporting.

For example, we have a specific service launched on our cluster that is running on a specific node as a specific container. The logdna-agent logs, as of almost 24 hours later (so 2022-11-03T14:00:00Z), that mention this container are as follows:

logdna-agent-qvshv.log:[2022-11-02T14:54:10Z INFO  fs::cache] watching "/var/log/containers/nats-streaming-ft-0_default_nats-streaming-7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256.log"
logdna-agent-qvshv.log:[2022-11-02T14:54:10Z INFO  fs::cache] watching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256"
logdna-agent-qvshv.log:[2022-11-02T14:54:10Z INFO  fs::cache] watching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log"
logdna-agent-qvshv.log:[2022-11-02T14:54:10Z WARN  fs::cache] watch descriptor for /var/log/containers/nats-streaming-ft-0_default_nats-streaming-7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256.log already exists...
logdna-agent-qvshv.log:[2022-11-02T14:54:10Z INFO  fs::tail] initialize event for symlink /var/log/containers/nats-streaming-ft-0_default_nats-streaming-7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256.log, final target /var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log
logdna-agent-qvshv.log:[2022-11-02T14:54:10Z INFO  fs::tail] initialized symlink "/var/log/containers/nats-streaming-ft-0_default_nats-streaming-7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256.log" as DefaultKey(52v1)
logdna-agent-qvshv.log:[2022-11-02T14:54:10Z INFO  fs::tail] initialize event for symlink /var/log/pods/default_nats-streaming-ft-0_bb7ab5d1-f9c9-45d6-a71c-0f1ad973db76/nats-streaming/0.log, final target /var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log
logdna-agent-qvshv.log:[2022-11-02T14:54:10Z INFO  fs::tail] initialize event for file /var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log
logdna-agent-qvshv.log:[2022-11-02T14:55:29Z INFO  fs::cache] unwatching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256"
logdna-agent-qvshv.log:[2022-11-02T14:55:29Z INFO  fs::cache] unwatching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log"
logdna-agent-qvshv.log:[2022-11-02T14:55:29Z INFO  fs::cache] ignoring "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log.1"
logdna-agent-qvshv.log:[2022-11-02T14:55:29Z INFO  fs::cache] watching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log"
logdna-agent-qvshv.log:[2022-11-02T14:55:29Z INFO  fs::tail] added "/var/log/containers/nats-streaming-ft-0_default_nats-streaming-7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256.log"
logdna-agent-qvshv.log:[2022-11-02T15:12:12Z INFO  fs::cache] unwatching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log"
logdna-agent-qvshv.log:[2022-11-02T15:12:12Z INFO  fs::cache] watching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log"
logdna-agent-qvshv.log:[2022-11-02T15:12:12Z INFO  fs::tail] added "/var/log/containers/nats-streaming-ft-0_default_nats-streaming-7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256.log"

..."unwatching => watching => added" cycle repeats 6 more times, about every 3 to 5 minutes...

logdna-agent-qvshv.log:[2022-11-02T16:00:45Z INFO  fs::cache] watching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log"
logdna-agent-qvshv.log:[2022-11-02T16:00:45Z INFO  fs::tail] added "/var/log/containers/nats-streaming-ft-0_default_nats-streaming-7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256.log"
logdna-agent-qvshv.log:[2022-11-02T16:04:17Z INFO  fs::cache] unwatching "/var/lib/docker/containers/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256/7c1f911ad110114df6d7e74bbe94816e24aeca7d28ea62d7f200bf059d791256-json.log"

So -- logdna-agent stopped watching this file as of 2022-11-02T16:04:17Z and then never picked it back up. I've logged in to the node and confirmed that this log file still exists and has lines in it from after this time -- almost 24 hours worth of lines.

The instability appears to have started with the log rotation process that happened at 2022-11-02T14:55:29Z, though it does seem the agent recovered just fine.

The k8s node is running Ubuntu 20.04.2 LTS (GNU/Linux 5.8.0-1041-aws x86_64). Disk, memory and cpu are all well below max.

@gjanco
Copy link
Contributor

gjanco commented Nov 3, 2022

..."unwatching => watching => added" cycle repeats 6 more times, about every 3 to 5 minutes...

Are your logs rotating this frequently?

@andrew-barnett
Copy link
Author

andrew-barnett commented Feb 23, 2023

Per Mezmo support, they noticed an issue handling symlink files in 3.6 and 3.7 and requested we downgrade to 3.5. That worked for us. They just recently released 3.8 which was to have fixed this issue. We're testing 3.8 in our environment to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants