Skip to content

Releases: netdata/netdata

v1.44.0

06 Dec 18:15
Compare
Choose a tag to compare

Table of Contents

Steady to our schedule, this is another great Netdata release!

Important

Stay informed about upcoming changes and potential deprecations by reviewing the deprecation notice sections. This will help you plan for any necessary adjustments to ensure a smooth transition.

Netdata Growth

  • 66k+ GitHub Stars ⭐
    Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!

  • 600M+ docker hub pulls
    Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.

Release Summary

  • Netdata beats Prometheus in all aspects: this version of Netdata includes significant improvement allowing Netdata to be a lot more performant than Prometheus, at scale. Full performance analysis included.
  • Netdata Journal Logs: Netdata can now deal with huge systemd-journal databases and is available for the host logs when Netdata runs in a container.
  • First beta version of Netdata's log2journal: a utility to extract, convert, transform and send to systemd-journal any kind of structured logs (including JSON and logfmt logs), similar to what promtail does for Loki.
  • More Netdata Functions: monitor containers and VMs, network interfaces, mount points, block devices, systemd units, systemd services, and more!
  • Netdata now logs to journal instead of log files and the results are amazing!

Release Highlights

Netdata beats Prometheus in all aspects

image

We tested Netdata and Prometheus at scale, both ingesting 2.7 million metrics per second. On the same workload, Netdata vs Prometheus needs:

  • 35% less CPU
  • 49% less RAM
  • 12% less bandwidth
  • 75% less disk space
  • 98% less disk I/O

Read the full performance comparison between Netdata and Prometheus.

To achieve these astonishing results, we made the following changes to Netdata since the previous release:

New SLOTS streaming protocol

A new streaming protocol, allows Netdata children and parents to share a common index of the metrics streamed, allowing the parents to receive metrics without consulting hashtables, reducing the overall overhead on parents by about 30%, without increasing the overhead on children (the children just number each metric).

The new protocol, called SLOTS, is automatically selected when both the child and the parent support it.

Streaming compression algorithms

Streaming now supports multiple compression algorithms. Previous Netdata releases supported only LZ4, which is known for its speed and average compression ratio. This release adds support for ZSTD, GZIP, and BROTLI.

ZSTD provides the best balance between compression ratio and CPU consumption, and therefore it is now the default.

The compression algorithms selection order can be configured on parents, in stream.conf, at the [API] section (parents), by setting compression algorithms order = zstd lz4 brotli gzip.

If you need to save most bandwidth at the expense of CPU utilization set this so that brotli or gzip appear first in the list, before zstd and lz4.

This also means that parents can now have a different compression order for each API key, allowing the use of different API keys based on the location of the child (i.e. children that are on billable egress bandwidth can use an API key that prefers the best compression, like brotli and gzip, while children on non-billable egress bandwidth can use an API key that prefers the best CPU utilization, like zstd or lz4).

Gorilla compression beta

Gorilla compression is a time series data compression technique, developed by Facebook for their time series database, Gorilla. It's particularly efficient for compressing data that changes incrementally over time, which is a common characteristic of time series data.

This release of Netdata includes an adaptation of Gorilla compression, which once enabled, provides 30% additional memory reduction to Netdata.

This was not ready when we compared Netdata and Prometheus, so the Gorilla compression benefits weren't accounted in the comparison. By enabling Gorilla compression, Netdata memory reduction is 70%+ compared to Prometheus.

To try Gorilla compression, edit netdata.conf and set at the [db] section, dbengine page type = gorilla.

Keep in mind that enabling Gorilla compression changes the dbegnine file format to Gorilla compressed metrics. This version of Netdata can read Gorilla-compressed data from dbengine even if Gorilla compression is not enabled, but previous versions of Netdata cannot read it. So, enable Gorilla, only if you don't plan to switch back to a previous version of Netdata.

Our plan is to have Gorilla compression enabled by default at the next release of Netdata.

systemd-journal logs

Our systemd-journal.plugin was already quite faster (10x) than journalctl, but still it was slow when the journal databases is huge (e.g. at journals centralization points where hundreds or thousands of nodes push their logs).

In this release, we introduce several changes to allow the plugin to work promptly in such environments.

Sampling and estimations

The biggest performance issue with systemd-journal logs is the query performance when dealing with huge logs databases.

To overcome this performance issue and provide prompt responses to queries, Netdata now uses the following strategy:

  1. The latest 500k log entries read from journal files work like before: we read all of them and all the values for all their fields, so that we can have accurate histograms and counters per field value at the filters.
  2. Once we hit the 500k log entries limit on a single query, we turn on sampling and estimations.
  3. Sampling distributes 500k more log entries to all the journal files to be read, so that the total log entries queried for their field values will be 1M. This means that if we have to read 100 files, 10k log entries per file will be sampled and 10k log entries more will be unsampled. Since files are usually spread over time, this provides a good sample across time.
  4. When the sampling threshold is hit, Netdata continues reading more log entries without querying the values of the fields. These log entries appear as [unsampled] at the histogram. We know these log entries are there, but the value counters on the field filters do not include them.
  5. When the [unsampled] threshold is hit, and we have read more than 1% of each file, Netdata estimates the number of entries that will be read from the file and skips the rest of it. This estimation appears as [estimated] in the histogram.

The above process allows Netdata to provide a histogram of the logs in a timely manner, even when the number of log entries in the visible timeframe is several dozen million.

A similar process is usually used by log management systems, including Grafana Loki and Elasticsearch. However, Netdata takes a much bigger sample of the data (other systems usually sample only a few thousand log entries, while Netdata usually samples more than a million) and the visualization allows exposing the exact sampling and estimations made at the histogram.

Image showing [unsampled] and [estimated] on a systemd journal system that collects about 10k nginx log entries per second:
image

Read more about journals query performance.

journals scan

On busy logs centralization servers, the number of journal files available in /var/log/journal/remote can grow significantly, slowing down directory listing (even ls -l is very slow on them).

To overcome this issue, Netdata now uses inotify events and sorts the files to be scanned from the latest to the oldest.

These change...

Read more

v1.43.2

30 Oct 15:49
Compare
Choose a tag to compare

Netdata v1.43.2 is a patch release to address issues discovered since v1.43.1.

This patch release provides the following bug fixes and updates:

  • Fix rrdlabels type (1676de2, @stelfrag).
  • Fix label copy to allow new keys with different values (6179213, @stelfrag).
  • Fix internal label source propagation when streaming metrics (60cd86d, @ktsaou).
  • Speed up queries when sending alerts to Cloud on parents with a large number of alerts per child (f80f0fc, @MrZammler).
  • Fix filtering when selecting multiple fields in systemd-journal plugin (750ca8e, @stelfrag).
  • Fix an issue where parents were missing chart labels of child instances (240f9e7, @ktsaou).
  • Fix an issue where updated labels were not propagated to parents (644d432, @stelfrag).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

v1.43.1

26 Oct 14:17
Compare
Choose a tag to compare

Netdata v1.43.1 is a patch release to address issues discovered since v1.43.0.

This patch release provides the following bug fixes and updates:

  • Prevent wrong optimization armv7l static build (#16274, @stelfrag).
  • Fixed pattern matching in Functions Search (#16264, @ktsaou).
  • Fixed an issue where the query planner was using the wrong dbengine tier that had no data for the selected time period (#16263, @ktsaou).
  • Fixed invalid payload in Discord notifications (#16257, @luchaos).
  • Fixed possible deadlock on discovery thread shutdown in cgroups plugin (#16246, @stelfrag).
  • Fixed duplicate chart labels (#16249, @stelfrag).
  • Fixed dimension HETEROGENEOUS check (#16234, @stelfrag).
  • Updated go.d plugin version to v0.56.3 (#16228, @ilyam8).
  • Fixed calculation of dbengine statistics on 32bit systems (#16222, @stelfrag).
  • Improved handling of duplicate labels (#16172, @stelfrag).
  • Improved cleanup on shutdown of collectors (#16023, @ktsaou)

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.

  • @luchaos for fixing Discord notifications.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

v1.43.0

16 Oct 21:00
Compare
Choose a tag to compare

Groundbreaking: systemd-journal logs release!

Table of Contents

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 65.5 k GitHub Stars ⭐
    Since October 2023, Netdata is leading the observability category in the CNCF landscape, surpassing Elasticsearch. Thank you for your love ❤️! Give Netdata a ⭐ too, on GitHub!

  • 595 M docker hub pulls
    Netdata runs with about 200k docker hub downloads per day. Since June 2023 we are a Verified Publisher, so that Netdata pulls don't count against docker hub pull limits for our users, allowing all our users to integrate Netdata to their CI/CD toolchains.

Release Summary

This release is the most robust and reliable Netdata we have ever built.

These are the main areas Netdata has improved since the last release:

  1. Logs
    Today we release an almost rewritten version of systemd-journal, to improve its performance and visualization capabilities. systemd-journal holds critical systems and security information and given the lack of systemd-journal visualization tools, we focused first on filling this gap. At the same time, we are standardizing the way logs should be as a part of Netdata, enabling us to support more log management engines, like Loki and Elasticsearch.

  2. Instances Slice and Dice
    Given the capabilities of the new Netdata Agent UI (v2), we are changing the way some of our collectors collect and expose metrics, to allow easier slicing and dicing of the data and be more OpenTelemetry compatible in terms of specifications. So, in this release we changed the way apps.plugin exposes charts in the Applications section of the dashboard. Following the NIDL framework, each application group is now an instance, allowing better aggregation of processes utilization across nodes. Similarly, our systemd units charts have been updated to have an instance for each systemd unit. For the same reasons, disk charts now have additional labels (id, model and serial) to help us identify disks from the charts. Unfortunately, such changes tend to make the older dashboards (v1, v0) less usable, especially on servers with many hundreds of instances.

  3. Stock Alerts
    A number of changes have been implemented to the Netdata Health engine, to allow better integration with the new dashboard. More changes in this area are about to come, as part of the next release: a) allow multi-node alerts on parents, b) allow evaluating and configuring alerts from the UI.

  4. Alerts Accuracy
    Netdata has by default 3 tiers of metrics, each with a different resolution. The Netdata query planner is automatically picking the right tier to satisfy a query, based on the number of points requested in the response. For alerts there was a side effect. Since alerts request only 1 point of data in the response, the query planner was picking the "easier" tier to query, which is of course the one with the lower resolution. Now alerts are always run on tier 0, the higher resolution one.

  5. Lower Resources Utilization
    Several changes have been implemented for Netdata to better take care of itself. That includes lower memory usage, lower disk footprint, self vacuuming of SQLite databases, and more. Probably the most notable change is that now Netdata needs only 1 pointer (8 bytes on 64 bit, 4 bytes on 32 bit) for each use of a label name-value combination. This improves drastically Netdata's memory requirements in setups like busy k8s clusters, that containers come and go all the time, increasing the labels cardinality significantly.

  6. 32bit Netdata on 64bit IoT machines
    A common request when Netdata is installed on 64bit IoT devices, is to have a 32bit Netdata running there. Before this release, this was not possible. Now a 32bit Netdata will nicely run on a 64bit operating system.

  7. Netdata Cloud on prem
    Netdata Cloud is now available to be installed on-prem! Several companies have already deployed it and are currently testing it. If you want to join them, submit this form.

Release Highlights

systemd-journal

systemd-journal was first included in Netdata v1.42.0. Immediately after release, we recognized the wider need for this feature, so we've rewritten the plugin almost entirely, to provide the best possible experience. This work is also fundamental for supporting more log monitoring integrations - stay tuned!

The major improvements done on systemd-journal logs function were:

  • addition of the histogram for log entries over time, with a break down per field-value, for any field and any time-frame
  • enable of the PLAY mode provides the same experience as journalctl -f, showing new logs entries immediately after they are received
  • allow filtering on any journal field or field value, for any time-frame
  • add support for coloring log entries, the same way journalctl does

If you want to take a look at a full presentation of the systemd-journal plugin, how it works, how you can take full advantage of this and even instructions on configuration of a logs centralization server, check the documentation for the plugin.

chrome_tf8dV0qS5x

You can experience the power of systemd-journal logs function in one of our Netdata demo rooms here
or check our latest YouTube video on it.

Want to know why you should untap the full potential of systemd-journal logs? Check out Netdata's founder, Costa Tsaousis @ktsaou, blogpost on it here.

Virtual Machine monitoring (VMWare vSphere)

With the increased feedback and requests on VMware vCenter Server collectors we have:

  • Reviewed our out-of-the-box charts
  • Added labels to the charts, e.g. host, datacenter, cluster, vm
  • Reviewed the metadata on alerts
  • Added summary charts section

It is with this feedback from the Community that we can keep working on improving Netdata to ensure it meets
your needs!

What is coming next

We are currently working on the following areas, which we hope to release next month:

  1. Logs Explorer for Loki and Elasticsearch
    Similar to systemd-journal, allow Netdata to explore, query and visualize logs from Loki and Elasticsearch.

  2. Collectors Configuration from the UI
    In the last release we presented the Integrations Marketplace. Since then, we work to make all integrations configurable via the dashboard. This will allow all of us to configure our Netdata servers directly from the UI, without touching configuration files, improving significantly the usability and easiness of Netdata.

  3. Alerts Configuration from the UI
    Similarly, we work to allow configuring alerts directly from the UI, without text file configurations, so the all of us can create powerful alerts on the spot.

  4. Netdata Mobile App
    We are at the final stage of releasing our Netdata Mobile App (iOS and Android) for receiving mobile push notifications and exploring alerts statuses.

  5. Scalability
    Given the wide adoption of Netdata, we are committed to make Netdata scale better in larger environments. Especially when it comes to Netdata parents, we aim to provide the best scalability possible. We are currently finalizing the necessary changes to allow Netdata achieve:

    • 1 CPU core per 1 million metrics/s for data collection
    • 1 CPU core per 1 million metrics/s for ML and health (alerts)
    • 1 CPU core per 1 million metrics/s for re-streaming (pushing metrics to another parent)

    Of course, the numbers depend on the CPU and its clock, but they shouldn't vary significantly on modern systems.

    A...

Read more

v1.42.4

18 Sep 15:11
Compare
Choose a tag to compare

Netdata v1.42.4 is a patch release to address issues discovered since v1.42.3.

This patch release provides the following bug fixes and updates:

  • Fixed alarm variables not being created for all chart dimensions. (#15984, @MrZammler).

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

v1.42.3

11 Sep 16:35
Compare
Choose a tag to compare

Netdata v1.42.3 is a patch release to address issues discovered since v1.42.2.

This patch release provides the following bug fixes and updates:

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.

  • @moonbreon for improving handling of closed connections in streaming.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1700 engineers are already using it!

v1.42.2

28 Aug 15:57
Compare
Choose a tag to compare

Netdata v1.42.2 is a patch release to address issues discovered since v1.42.1.

This patch release provides the following bug fixes and updates:

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise
that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a
remarkable product.

  • @kevin-fwu for adding an option to avoid duplicate labels when exporting in Prometheus format.
  • @k0ste for fixing permission attributes for conf.d dirs for RPM.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!

v1.42.1

16 Aug 16:49
Compare
Choose a tag to compare

Netdata v1.42.1 is a patch release to address issues discovered since v1.42.0.

This patch release provides the following bug fixes and updates:

  • Fixed issue with missing entries for Systemd-journal and Processes functions (#15814, @ktsaou)
  • Fixed linking health.log to stdout in Docker (#15813, @ilyam8)
  • Updated UI version to v6.28.0 (#15810, @ilyam8)
  • Fixed 401 when behind a proxy with Basic auth and signed in (#15808, @ktsaou)
  • Fixed Health Management API (#15806, @underhood)
  • Fixed build deps in DEB packages for systemd-journal.plugin (#15805, @Ferroin)
  • Cleaned up python deps for RPM packages (#15804, @Ferroin)
  • Added proper SUID fallback for DEB plugin packages (#15803, @Ferroin)
  • Fixed an issue where the nd_journal_process column was not populated for the Systemd-journal function (#15798, @ktsaou)
  • Fixed negative retention when database is empty in /api/v2/info (#15796, @ktsaou)
  • Fixed handling of unassigned drives for python.d/hpssa (#15793, @ilyam8)
  • Fixed an issue that prevented systemd-journal.plugin from restarting (#15787, @ktsaou)
  • Fixed publishing of openSUSE 15.5 packages (#15781, @tkatsoulas)
  • Updated OpenSSL version of static builds to 1.1.1v (#15779, @tkatsoulas)

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

  • Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
  • GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
  • GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
  • Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
  • Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1600 engineers are already using it!

v1.42.0

09 Aug 17:29
Compare
Choose a tag to compare

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 64.5 k GitHub Stars ⭐

    Netdata got at the top trending repos on GitHub, after the last release. ❤️ Thank you for your love! 🚀 You rock!

    Give Netdata a ⭐ on GitHub too!

  • 580+ M docker hub pulls, running at 200+ k per day.

    Netdata is a verified publisher on Docker Hub, and our users enjoy free unlimited Docker Hub pulls!

Release Highlights

Integrations Marketplace

A beta version of the Netdata Marketplace is included in this release:
image

More than 800 integrations are available, directly from the dashboard. For each integration, all the information required to get it up and running is included:

2023-08-08 15-36-40

Integrations are still in beta. We improve it every day, but we think it is already quite useful.

SystemD Journal

A new Netdata Function has been added to query the systemd journal logs:

2023-08-08 16-04-49

The function respects the current date-time picker, so it can query any possible timeframe the systemd journal has data for.

IMPORTANT

Netdata Functions are available only when you are signed in to Netdata and your Netdata Agent is claimed.
This has been done to protect your privacy. Netdata Cloud checks that the users of the Agent dashboard are allowed to view this information.

IMPORTANT

The systemd-journal function is currently available only on Netdata Agents that have been installed from source, or with native packages of the Linux distribution (RPM, DEB). For users running static builds of Netdata or running Netdata in a Docker container, we are working to bring systemd-journal to them too. Stay tuned...

Claiming via the UI

You can now connect your agents to Netdata Cloud, via the dashboard:

2023-08-08 15-53-30

The UI verifies that you are the owner of a Netdata, by asking you to provide a random key that is saved to a file on disk. Once you provide the right key, Netdata is automatically claimed to your space at Netdata Cloud.

Easily Spot Anomalies

The UI has an AR button above the menu. When you press it, the dashboard queries the Netdata Metrics Scoring Engine, to find the anomaly rates for the visible timeframe, across the metrics included in the dashboard. Then it add a badge next to each category and subcategory, showing its anomaly rate.

This way, you can quickly spot what is anomalous on the current view of the dashboard.

2023-08-08 16-25-44

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @Leny1996 for fixing Docker bind-mount stock files creation.
  • @fhriley for adding Linux power cap Intel RAPL metrics collector.
  • @icy17 for fixing potential crash in the h2o server.
  • @kiela for fixing typos and images placement in the Deployment Strategies doc.
  • @zeylos for fixing non-interactive options for apt-get and zypper.

Contributions

Collectors

New

  • Add AMD GPU collector (proc.plugin)(#15515, @Dim-P)
  • Add PCI Advanced Error Reporting metrics collector (proc.plugin) (#15488, @ktsaou)
  • Add Linux power cap Intel RAPL metrics collector (proc.plugin) (#15364, @fhriley)
  • Add systemd-journal plugin (systemd-journal.plugin)(#15363, @ktsaou)

Improvements

  • Collect EDAC metrics per-memory controller (MC) and DIMM (proc.plugin) (#15473, @ktsaou)

Bug fixes

Other

  • Change restart message to info (freeipmi.plugin) (#15664, @ilyam8)
  • Filter out systemd-udevd.service/udevd cgroup (cgroups.plugin) (#15571, @ilyam8)
  • Improve FD limit issue tracing (apps.plugin) (#15504, @ktsaou)
  • Add hash table charts for internal monitoring (ebpf.plugin) (#15323, @thiagoftsm)

Documentation

Packaging / Installation

Read more

v1.41.0

19 Jul 21:37
Compare
Choose a tag to compare

Checkout the v1.41 release meetup recording or read on to learn more about the new UI and other features in this release.

netdata release notes meetup

Steady to our schedule, this is another great Netdata release!

Netdata Growth

  • 64 k GitHub Stars ⭐
  • 1.7 M monitored nodes
  • 570+ M docker hub pulls

Give Netdata a ⭐ too, on Github!

❤️ Thank you for your love! 🚀 You rock!

Release Highlights

New Agent Dashboard

Netdata Agents and Parents now have a new UI!

New CHARTS 🟢 New SUMMARIES 🟢 MACHINE-LEARNING FIRST 🟢 INFRASTRUCTURE LEVEL DASHBOARDS 🟢 FILTER, SLICE, and DICE any dataset 🟢 ANOMALY ADVISOR 🟢 METRICS CORRELATIONS 🟢 NETDATA FUNCTIONS 🟢 EVENTS FEED 🟢 HEATMAPS 🟢

Netdata Agent

In the last few months, we have ported and open-sourced all Netdata Cloud APIs to the Netdata Agent, allowing Netdata Parents to drive the same multi-node / infrastructure level dashboards Netdata Cloud provides!

So, as of today, Netdata Agents and Parents present the same UI, exactly the same dashboard, charts and features with Netdata Cloud!

Single Node Dashboard Changes

Apart from the entirely new look, single-node dashboards now group similar charts together. So, all disk drives, network interfaces, cgroups (containers and VMs), are now a single set of charts.

This allows Netdata to aggregate a vast amount of datasets in a chart, like the following, where almost 20k containers are now manageable:

image

To make it easier for you to navigate, filter, slice, and dice the data, the menus above each chart give you easy access to all the data of the chart:

Netdata Agent 2

Multi Node Dashboards

When Netdata Agents are configured as Parents (multiple other agents stream metrics to them), they now present multi-node and multi-instance charts. At the top right corner of the dashboard, there is the global nodes filter, from which you can slice the entire dashboard for one or a few of your nodes.

image

Want to know more?

Get a firsthand walkthrough with Costa Tsaousis, Netdata's Founder, on the rationale for this change and the path Netdata is taking by checking the video from Netdata Office Hours on YouTube.

The old dashboards are still accessible

You can still access all versions of the dashboards, as follows:

  • http://your.server:19999/
    The default dashboard is now a live version of the new UI. The dashboard static files are served by Cloudflare and are automatically updated when we release a new version of the UI, so that your Netdata agent is always up to date.

  • http://your.server:19999/v2/
    A local copy of the latest dashboard, as it was at the time the agent was released. This is distributed with Netdata under the Netdata Cloud UI License v1.0. The local copy is automatically used if for any reason the web browser cannot download the live version of it.

  • http://your.server:19999/v1/
    The previous single-node version of the Netdata Agent dashboard.

  • http://your.server:19999/v0/
    The now ancient, original version of the Netdata Agent dashboard.

Netdata Assistant

Netdata Assistant: Your AI-Powered Troubleshooting Sidekick

The Netdata Assistant is an AI-powered tool that uses large language models and our community's knowledge to guide you during troubleshooting and help you get to the root cause sooner.

The goal of the Netdata Assistant is straightforward: to make your troubleshooting process easier. It's here to save you from the hassle of sifting through tons of information so you can focus on solving the problem at hand.

It will give you the lowdown on the alert, why it's happening, and why you should care. It'll also guide you on how to troubleshoot it and even offer some handy web links for more info if you're interested.

image

Read more about it on the Netdata blog here.

New FreeIPMI collector for monitoring enterprise hardware

Netdata got a new FreeIPMI collector. The new collector is able to collect IPMI sensors at a much better data collection rate, and it is more reliable and robust compared to the previous one.

We have also categorized all sensors based on the component they monitor:

image

And provided as labels the exact sensor name each metric refers to:

image

Netdata Detects FDs Leaking

"FD" stands for "file descriptor". A file descriptor is an integer that the operating system assigns to an open file to track it. This includes regular data files, directories, network sockets, pipes, and other types of I/O streams.

In Linux, everything is treated as a file, which includes hardware devices, directories, and sockets. Each open file is assigned a file descriptor. When a file is closed, its file descriptor is freed up for reuse. However, if an application doesn't close a file when it's done with it, that's called a "file descriptor leak".

File descriptor leaks can cause several problems:

  1. Resource exhaustion: Each process has a limit to the number of file descriptors it can open. If a process continually leaks file descriptors without closing them, it will eventually hit this limit and won't be able to open any more files, which often causes the process to crash.

  2. Unexpected behavior: Open file descriptors hold resources, like network sockets, that might be expected to be available for other uses. If these resources are tied up due to a leak, it can cause unexpected behavior.

  3. Security issues: File descriptors can sometimes be used to gain unauthorized access to data if they're not properly managed.

apps.plugins is now able to track the usage of FDs against the limits set for each application. We have added an fds category in the Applications section of the dashboard. The first chart shows the percentage of FDs used by each application against its limits:

image

Acknowledgements

We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.

  • @k0ste for improving Prometheus exporting doc.
  • @carlocab for replacing info macro with a less generic name.
  • @MYanello for updating the pfSense package installation instructions.

Contributions

Collectors

Improvements

  • Improve of fds monitoring (apps.plugin) (#15437, @ktsaou)
  • Add application groups file descriptor limit monitoring (apps.plugin) (#15417, @ktsaou)
  • Re-create sdr cache on start (freeipmi.plugin) (#15361, @ktsaou)
  • Add sensor state chart, create a per-sensor chart instead of a per-sensor dimension (freeipmi.plugin) (#15327, @ktsaou)
  • Expose CmdLine in apps function (apps.plugin) (#15275, @ilyam8)
  • Remove pod_uid and container_id labels in k8s (cgroups.plugin) (#15216, ...
Read more