A distributed storage engine for Netdata agents #12216

odynik · 2022-02-22T17:41:21Z

odynik
Feb 22, 2022

The Idea

The goal of this discussion is to explore how Netdata agent can store metrics in a distributed storage backend, for efficient scaling of Netdata-based cloud services. The proposed solution should follow high-performance and production-grade requirements, and its design shall allow the code changes to be fully integrated upstream with the current Netdata agent codebase.

The current status

Netdata has a distributed architecture: it currently supports exporting metrics to a variety of storage backends, supports streaming metrics to other Netdata agents, as well as loading metrics from remote agents. However, a Netdata agent can currently only store persistent metrics using Netdata’s internal format locally. This design sets limitations when multiple Netdata agents can’t share a single storage backend, making the storage of large amount of customer data inefficient/impractical.

The proposed changes

This discussion was mainly initiated during the exploration of the required changes to integrate a remote MongoDB database for metrics data storage, that would let a Netdata agent load and store metrics data in real-time from the remote database in an efficient way.
Metadata storage shall also be distributed, but for now is optional to be synchronized across Netdata agents in a (soft) real-time manner.

1. Modular storage engines

Netdata agent current codebase is strongly based on condition checks regarding the memory mode. This dependency increases the complexity of the codebase and limits the scalability of the Netdata agent with other forms of memory mode. This change proposes the creation of an abstraction layer between the Netdata agent and the storage mechanism (volatile (ram, none, etc...), non-volatile (save, dbengine), sql, nosql) that would digest the current memory modes an provide scalability and flexibility with other storage mechanisms (i.e. mongoDB).

• Add a new C structure allowing each storage engine to expose a set of properties, and its API implementation.
• Implement the new structure for existing engines
• Build a list of available storage engine properties at compile time (including existing engines)
• Use this list to instantiate and use the configured storage engine.
• Across Netdata’s codebase, loose any hardcoded reference to specific storage engines with access to engine properties or calls to new engine APIs.

2. MongoDB storage engine

Use mongoDB as a proof-of-concept of integrating other storage mechanisms through the storage engine API [1. Modular storage engines].
• Add the MongoDB storage engine skeleton.
• Add new MongoDB-specific configuration: server url, timeout, database etc.
• Implement read/write metrics operations to the database.

3. Metadata

There are two suggestions for saving the Netdata metadata for new storage mechanisms:
a) Metadata storage would become fully storage-engine dependent. SQLite would be used for existing engines, and MongoDB would be used to store metadata when using the MongoDB storage engine.

Add metadata APIs to the storage engine API.
Modify existing SQLite metadata logic to use the new storage engine metadata API.
Implement metadata storage in MongoDB engine.

b) Metadata storage would be configured independently from the storage engine. Current metadata logic would be adapted to work with other storage engines (as part of step 1). The metadata storage engine API would mostly be a SQL client interface, to allow replacing the embedded SQLite client.

Create a simple metadata storage engine API based on SQL.
Move current embedded SQLite to a metadata storage engine.
Build a compile-time list of metadata storage engines.
Allow to configure metadata storage engine.

4. Storage engine data streaming

In this step, the storage engine is modified to allow acting like a provider of live metrics data. This would allow two Netdata nodes to stream data to each other, through the database instead of using the Netdata protocol. This implies the use of a database supporting “listening” for data changes, such as MongoDB, Redis with pub/sub etc.
Since data streaming already exists in Netdata (between Netdata agents), internal APIs used for data streaming might be reused or adapted. Incoming live metrics data from the storage engine could be handled like incoming data from a child node.

andrewm4894 · 2022-02-22T17:59:19Z

andrewm4894
Feb 22, 2022

influx iox are doing some really interesting stuff in this space that might be somewhat useful or related:

https://www.influxdata.com/blog/announcing-influxdb-iox/
https://www.influxdata.com/blog/designing-a-parquet-catalog-for-influxdb-iox/
https://github.com/influxdata/influxdb_iox
https://www.youtube.com/channel/UCnrgOD6G0y0_rcubQuICpTQ/search?query=iox

0 replies

aberaud · 2022-02-25T20:54:56Z

aberaud
Feb 25, 2022

I started working on this idea.

Here is a first draft proposal for the new storage engine API:

savoirfairelinux@7befb69

1 reply

odynik Mar 1, 2022
Author

Hi @aberaud ! This looks like a good starting point.
Note: You can find two minor suggestions in your commit.

thiagoftsm · 2022-02-28T22:55:51Z

thiagoftsm
Feb 28, 2022
Collaborator

Hello guys,

I liked your idea, and I have few suggestions:

Some databases like mysql/mariadb have internal commands that we should avoid. To proceed with this we need to have an API working with ANSI SQL standards.
In the ideal world, we will also have one-to-many relationship to avoid duplication.
Before we touch the code, let us make a well documented schema that we will follow.
It would be awesome if it is possible to have SQL functions inside our database, but I understand this is not a common pattern between database.
Finally, we are real time monitoring, so any index for timestamp should be DESC.

Any doubt, please, let me know.

0 replies

aberaud · 2022-03-21T18:10:17Z

aberaud
Mar 21, 2022

We're still working on this, and are making good progress.

Current status

We are building on the preliminary API proposal discussed above, taking comments into consideration.

The first step is some groundwork to introduce opaque structures allowing new storage engines to provide their own iterator structures to collect and query data.
status: merged !
The next step is to remove the global multidb_ctx and replace it with a per-engine instance when the engine supports it (only rrdengine in the current code).
status: draft done, in review
The following step is to move memory-mode-specific code to engine-provided API calls.
status: draft done, in review

Metadata

For metadata, our plan is to go with suggestion (a) and store metdata with metrics data. That means current engines would keep using the embedded SQLite for metdata, while other engines like MongoDB would have to implement their own metadta storage mechanism.
Following a discussion with the Netdata team, we are holding on metadata-related changes for now because the Netdata team has their own plan to refactor the metadata layer.

0 replies

aberaud · 2022-04-04T18:34:23Z

aberaud
Apr 4, 2022

Hi,
Last week we pushed a new version of the PR that adds a new API call to the storage engine API.

The new API call abstracts away the engine-specific logic from rrd2rrdr (query.c).

The latest version of the API is here:
https://github.com/netdata/netdata/blob/c297a66c7e9e355e00203a99fdab003d0045afa6/database/storage_engine.h

Feedback and comments are welcome :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A distributed storage engine for Netdata agents #12216

{{title}}

Replies: 5 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

A distributed storage engine for Netdata agents #12216

odynik Feb 22, 2022

The Idea

The current status

The proposed changes

1. Modular storage engines

2. MongoDB storage engine

3. Metadata

4. Storage engine data streaming

Replies: 5 comments · 3 replies

andrewm4894 Feb 22, 2022

aberaud Feb 25, 2022

odynik Mar 1, 2022 Author

thiagoftsm Feb 28, 2022 Collaborator

aberaud Mar 21, 2022

Current status

Metadata

aberaud Apr 4, 2022

odynik
Feb 22, 2022

Replies: 5 comments 3 replies

andrewm4894
Feb 22, 2022

aberaud
Feb 25, 2022

odynik Mar 1, 2022
Author

thiagoftsm
Feb 28, 2022
Collaborator

aberaud
Mar 21, 2022

aberaud
Apr 4, 2022