Replies: 5 comments 3 replies
-
influx iox are doing some really interesting stuff in this space that might be somewhat useful or related: https://www.influxdata.com/blog/announcing-influxdb-iox/ |
Beta Was this translation helpful? Give feedback.
-
I started working on this idea. Here is a first draft proposal for the new storage engine API: |
Beta Was this translation helpful? Give feedback.
-
Hello guys, I liked your idea, and I have few suggestions:
Any doubt, please, let me know. |
Beta Was this translation helpful? Give feedback.
-
We're still working on this, and are making good progress. Current statusWe are building on the preliminary API proposal discussed above, taking comments into consideration.
MetadataFor metadata, our plan is to go with suggestion (a) and store metdata with metrics data. That means current engines would keep using the embedded SQLite for metdata, while other engines like MongoDB would have to implement their own metadta storage mechanism. |
Beta Was this translation helpful? Give feedback.
-
Hi, The new API call abstracts away the engine-specific logic from rrd2rrdr (query.c). The latest version of the API is here: Feedback and comments are welcome :) |
Beta Was this translation helpful? Give feedback.
-
The Idea
The goal of this discussion is to explore how Netdata agent can store metrics in a distributed storage backend, for efficient scaling of Netdata-based cloud services. The proposed solution should follow high-performance and production-grade requirements, and its design shall allow the code changes to be fully integrated upstream with the current Netdata agent codebase.
The current status
Netdata has a distributed architecture: it currently supports exporting metrics to a variety of storage backends, supports streaming metrics to other Netdata agents, as well as loading metrics from remote agents. However, a Netdata agent can currently only store persistent metrics using Netdata’s internal format locally. This design sets limitations when multiple Netdata agents can’t share a single storage backend, making the storage of large amount of customer data inefficient/impractical.
The proposed changes
This discussion was mainly initiated during the exploration of the required changes to integrate a remote MongoDB database for metrics data storage, that would let a Netdata agent load and store metrics data in real-time from the remote database in an efficient way.
Metadata storage shall also be distributed, but for now is optional to be synchronized across Netdata agents in a (soft) real-time manner.
1. Modular storage engines
Netdata agent current codebase is strongly based on condition checks regarding the memory mode. This dependency increases the complexity of the codebase and limits the scalability of the Netdata agent with other forms of memory mode. This change proposes the creation of an abstraction layer between the Netdata agent and the storage mechanism (
volatile (ram, none, etc...)
,non-volatile (save, dbengine)
,sql
,nosql
) that would digest the current memory modes an provide scalability and flexibility with other storage mechanisms (i.e. mongoDB).• Add a new C structure allowing each storage engine to expose a set of properties, and its API implementation.
• Implement the new structure for existing engines
• Build a list of available storage engine properties at compile time (including existing engines)
• Use this list to instantiate and use the configured storage engine.
• Across Netdata’s codebase, loose any hardcoded reference to specific storage engines with access to engine properties or calls to new engine APIs.
2. MongoDB storage engine
Use mongoDB as a proof-of-concept of integrating other storage mechanisms through the storage engine API [1. Modular storage engines].
• Add the MongoDB storage engine skeleton.
• Add new MongoDB-specific configuration: server url, timeout, database etc.
• Implement read/write metrics operations to the database.
3. Metadata
There are two suggestions for saving the Netdata metadata for new storage mechanisms:
a) Metadata storage would become fully storage-engine dependent. SQLite would be used for existing engines, and MongoDB would be used to store metadata when using the MongoDB storage engine.
b) Metadata storage would be configured independently from the storage engine. Current metadata logic would be adapted to work with other storage engines (as part of step 1). The metadata storage engine API would mostly be a SQL client interface, to allow replacing the embedded SQLite client.
4. Storage engine data streaming
In this step, the storage engine is modified to allow acting like a provider of live metrics data. This would allow two Netdata nodes to stream data to each other, through the database instead of using the Netdata protocol. This implies the use of a database supporting “listening” for data changes, such as MongoDB, Redis with pub/sub etc.
Since data streaming already exists in Netdata (between Netdata agents), internal APIs used for data streaming might be reused or adapted. Incoming live metrics data from the storage engine could be handled like incoming data from a child node.
Beta Was this translation helpful? Give feedback.
All reactions