You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When monitoring Java applications using OpenTelemetry Java Auto-Instrumentation, the trace data incorrectly shows service A calling service B (A -> B), even though there is no actual call between A to B.
Based on the concept of microservices, A and B are producing and consuming data through a Kafka topic, maintaining "loose coupling" between each other. This issue is evident in the traces_service_graph_request_total metric and the Zipkin trace data, which suggests a relationship that does not exist.
If I enable the following two options in the OpenTelemetry instrumentation, Service A changes to "user," but the data is still identified.
Notice that the trace data incorrectly indicate service A is making requests to service B.
Check A->B in Grafana using the service graph or the traces_service_graph_request_total data.
Creating and testing a topic that they consume from each other would be more accurate.
Expected behavior
The trace data and metrics should accurately reflect the interactions between services. Specifically, no traces or metrics should suggest a direct interaction between service A and service B when there is none.
Environment:
Infrastructure: AWS EKS, AWS MSK
Deployment tool: Using Kubernetes manifests with gitops repo & ArgoCD
Additional Context
All services are deployed as pods in EKS.
The issue persists even after verifying that there are no overlapping or contaminated headers and that Trace IDs are unique and correctly configured.
The environment configuration for OpenTelemetry instrumentation includes settings for exporting to Prometheus and Zipkin, capturing content-type headers for HTTP requests and responses.
Service A
JDK: Amazon Corretto 17
Spring: 2.7.1
OS: Amazon Linux (EKS)
Service B
JDK: Amazon Corretto 17
Spring: 3.0.5
OS: Amazon Linux (EKS)
The text was updated successfully, but these errors were encountered:
Hi! Service graphs have a number of ways of identifying communication between services—for Tempo they're described in the docs. Connections not necessarily need represent HTTP requests.
* A request across a messaging system where the outgoing and the incoming span must have `span.kind`, `producer`, and `consumer` respectively.
This is what's identifying a connection between the two services.
Hey @mapno Your answer was fantastic. I have perfectly removed the problematic parts from the dashboard and various graphs using Tempo as a data source. I blame myself for not carefully reading the docs.
May I ask one more question?
When specifying span_kind, there is no data (span_kind_consumer, producer, server, client and unspecified). Is there any additional configuration needed? Simply setting connection_type=messaging_system shows all servers communicating through MSK
I am using auto-instrumentation because I cannot enforce spans on all technical teams, which makes it difficult for me to directly control headers, span kinds, IDs and etc....
Hey! span_kind is not a label of service graph metrics (it's set on span-metrics though). I'm not sure if it'd make sense to add it in the first place, since it's implicit by the connection type—ie. if connection_type is messaging_system, the spans must have had kindconsumer and producer.
Describe the bug
When monitoring Java applications using OpenTelemetry Java Auto-Instrumentation, the trace data incorrectly shows service A calling service B (A -> B), even though there is no actual call between A to B.
Based on the concept of microservices, A and B are producing and consuming data through a Kafka topic, maintaining "loose coupling" between each other. This issue is evident in the
traces_service_graph_request_total
metric and the Zipkin trace data, which suggests a relationship that does not exist.If I enable the following two options in the OpenTelemetry instrumentation, Service A changes to "user," but the data is still identified.
I initially raised this issue with the OpenTelemetry team, but their response suggested raising the issue with Grafana and Tempo instead.
Ref. open-telemetry/opentelemetry-java-instrumentation#11348
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The trace data and metrics should accurately reflect the interactions between services. Specifically, no traces or metrics should suggest a direct interaction between service A and service B when there is none.
Environment:
Additional Context
All services are deployed as pods in EKS.
The issue persists even after verifying that there are no overlapping or contaminated headers and that Trace IDs are unique and correctly configured.
The environment configuration for OpenTelemetry instrumentation includes settings for exporting to Prometheus and Zipkin, capturing content-type headers for HTTP requests and responses.
Service A
JDK: Amazon Corretto 17
Spring: 2.7.1
OS: Amazon Linux (EKS)
Service B
JDK: Amazon Corretto 17
Spring: 3.0.5
OS: Amazon Linux (EKS)
The text was updated successfully, but these errors were encountered: