You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the dbengine only allows setting disk usage limits to control data storage. This is useful in some scenarios, but runs into a couple of usability issues in others:
Users who don’t care about disk usage, but need a specific retention limit have to either do some non-trivial math themselves, or rely on our online space requirements calculator.
Retention targets are dependent on not only the disk usage limits, but also the total number of metrics. However, the total number of metrics cannot be determined reliably ahead of time, meaning users have to install Netdata, configure all their plugins, and then figure out how much space to give Netdata based on their data retention requirements.
A number of deployment scenarios involve a non-constant number of metrics, and only being able to configure disk usage limits instead of retention time limits results in potential unpredictability on such deployments in terms of actual retention times.
As an example of a scenario where these usability issues are relevant: On my home server, I have a huge amount of excess storage space that Netdata could utilize without a significant impact on the rest of the system, so disk usage limits are not particularly relevant to me except as insurance against the agent misbehaving. However, I also have two things set up on the system which result in the total number of metrics varying unpredictably over time:
User sessions each end up in their own cgroup, resulting in each new SSH or console login creating (currently) 44 new metrics via the cgroups plugin. Terminal servers using either systemd-logind or elogind would suffer from the same issue.
The total number of running VMs on the system is non-constant, and each cold restart of a VM results in a new set of metrics because of how libvirt handles the associated cgroups. Build farms, systems used for cloud hosting, and similar cases of systems running large numbers of potentially ephemeral VMs would suffer from the same issue.
Description
To support deployment scenarios where the usability issues mentioned above matter, I recommend adding a new dbengine configuration option to control how many data points get stored for each metric in the dbengine. I envision this functioning alongside the existing disk space usage limits and being applied only when the space usage of the dbengine does not exceed the disk space usage limits. A normal deployment scenario for such a setup would thus involve setting the desired retention limit, and then using the disk usage limit as a hard limit for worst case scenarios so that Netdata does not fill the whole disk if things go wrong.
I also recommend that when this option is specified, we add a log message during dbengine startup indicating worst-case disk usage and expected disk usage per dimension, to better allow users to fine-tune this new setting without needing to use an online calculator.
Importance
nice to have
Value proposition
Users who are not practically affected by disk space can set exact retention periods without needing to look online or understand how the dbengine works.
Users who have deployments with variable numbers of metrics can set specific retention periods without having to determine the worst-case scenario ahead of time and plan based on that.
Users migrating from other monitoring solutions that only provide time-based retention limits instead of disk usage limits will be able to more easily learn to configure Netdata.
Proposed implementation
Suggested name for the new configuration options: dbengine max points per metric
The text was updated successfully, but these errors were encountered:
This would also be a useful way to configure retention for me, as im primarily concerned with how much data time wise we can store, i.e i want to keep up to 3 years, but disk space is not that much of an issue for me. Taking a value of x days or years per tier, Netdata would then ideally auto tune for best settings for the engine.
Problem
Currently, the dbengine only allows setting disk usage limits to control data storage. This is useful in some scenarios, but runs into a couple of usability issues in others:
As an example of a scenario where these usability issues are relevant: On my home server, I have a huge amount of excess storage space that Netdata could utilize without a significant impact on the rest of the system, so disk usage limits are not particularly relevant to me except as insurance against the agent misbehaving. However, I also have two things set up on the system which result in the total number of metrics varying unpredictably over time:
Description
To support deployment scenarios where the usability issues mentioned above matter, I recommend adding a new dbengine configuration option to control how many data points get stored for each metric in the dbengine. I envision this functioning alongside the existing disk space usage limits and being applied only when the space usage of the dbengine does not exceed the disk space usage limits. A normal deployment scenario for such a setup would thus involve setting the desired retention limit, and then using the disk usage limit as a hard limit for worst case scenarios so that Netdata does not fill the whole disk if things go wrong.
I also recommend that when this option is specified, we add a log message during dbengine startup indicating worst-case disk usage and expected disk usage per dimension, to better allow users to fine-tune this new setting without needing to use an online calculator.
Importance
nice to have
Value proposition
Proposed implementation
Suggested name for the new configuration options:
dbengine max points per metric
The text was updated successfully, but these errors were encountered: