Should routing of /data requests be restricted to Space or Room-level nodes? (Netdata Cloud) #13554
Replies: 7 comments 1 reply
-
@juacker this is as a follow-up to the conversation on this bug netdata/netdata-cloud#525 @amalkov @ktsaou @papazach @ralphm @stelfrag it would be good to get your inputs, since we are experiencing this also on our current Netdata Production Space, e.g. on |
Beta Was this translation helpful? Give feedback.
-
The routing logic is such that all node instances for a given node are candidates for querying, for those node instances that are represented by Agents that have been claimed to a given space. This means that if you don't want a certain Agent to be responding for a given node, you should not claim that Agent to that space. Unfortunately, because nodes can only exist in one space on a (Cloud) Hub, this also means that in practice you should not claim the Agent to the same Hub. So, if you want to have experimental parent nodes, you should claim them to a different Hub (e.g. our internal staging or testing environments). Rooms have nothing to do with the logic, and that makes sense. Requiring parent nodes to be a member of a particular room so the nodes it represents can be queried seems undesirable to me. I.e. the parent typically has nothing to do with the workload of the monitored nodes. Maybe we could find a way for instances of a given node to be in different spaces, but I'm don't think our current data model supports that. |
Beta Was this translation helpful? Give feedback.
-
I think we can see the current authorization model as having two layers.
If the user decides to stream data from one agent to an experimental agent in a different space, should the cloud block that? With the current authorization model, I think the user has complete control over which users can access each data, and where they can access it from, so I think in terms of data access and resource utilization, they have full control and can decide the approach best suits their needs. For the scenarios mentioned above I think that some users may want to access the information, no matter if it is on an experimental agent, and some of them may not, so having a one-size-fits-all rule, suitable for everyone or for every purpose, can be difficult to find. I think, however, that the cloud does not learn from bad routing decision, and maybe trying to improve in that direction would help us solve this kind of issues. Do we want to give the user a finer grain of control over routing? do we want to add more intelligence to the cloud to learn automatically when a routing request fails continuously? I think both ways have pros and cons, but maybe we can move in this direction to solve this kind of issues. |
Beta Was this translation helpful? Give feedback.
-
I agree with your inputs @ralphm, but then our routing is "broken" since as @juacker mentioned:
If nodes are in different spaces there could be cross-space requests flowing. @juacker if this isn't to much effort we could start with fixing this - will open a bug if all agree. The smart/learning-routing inputs are really interesting but maybe we could start smaller, I remember in the past being discussed that we should provide a way for user to define a priority for routing, e.g. experimental nodes could have the lowest priority given, and this doesn't invalidate the more sophisticated approach for learning based on missed requests. |
Beta Was this translation helpful? Give feedback.
-
@ralphm with your last comment I'm not clear if you consider this as needed to fixed or not
|
Beta Was this translation helpful? Give feedback.
-
Isn't the point that right now nodes are tied to spaces, and then node instances to nodes? I.e. a node instance doesn't have a direct connection to the space. If you want to have certain instances in different spaces, we'd have to change that model. I'm not sure the Agent really cares. |
Beta Was this translation helpful? Give feedback.
-
Correct, these are the two layers that @juacker mentioned above.
Why the Agent is relevant here for routing? What @juacker has mentioned is:
As I understand, these nodes and node instances are kept in our Cloud BE which should be the representation of what is I just wanted to clarify if we wanted to fix this last part, but totally ok to close this discussion and park it since there are other priorities and this isn't an issue reported by any user. |
Beta Was this translation helpful? Give feedback.
-
Intro
When a
/data
request needs to be sent to a Node which has available multiple Node Instances (on parent nodes) routing needs to be applied to identify the proper Node Instance to satisfy that request.Currently, Cloud is looking across all the available Nodes Instances for a given Node independent of the Space or Room the instances are.
Why to restrict routing?
Ideal scenario (?)
If you have a Production Space or Room that has the nodes running with the configurations you want, on the agent versions you want, you ensure that all the
/data
requests are sent to at least one Node with the setup you wantCurrent scenario
On that Space but in a different Room or other Space, you could have some experimental Nodes getting data from Production nodes that you are using to run some experiments. With the current approach, it isn't easy to ensure the experimental nodes won't be
targeted for
/data
requests on data for some child node and having some unwanted behaviourNote: This behaviour is only intended for Cloud routing logic, this shouldn't affect current agent streaming in any way
Beta Was this translation helpful? Give feedback.
All reactions