Skip to content

Monitor Dashboard Metrics¶

This document serves as a supplementary introduction to explain the metrics of each monitoring dashboard in the traffic monitoring section of the mesh instance. It aims to facilitate user queries and definitions.

Global Monitor¶

Global Monitoring

Category	Parameters	Description	Calculation Method
General	Global Request Volume	"Global Request Volume" represents the total request volume in the entire service mesh. Ops/s denotes the number of operations or requests executed per second. In Istio, this metric is typically used to assess the overall traffic in the service mesh.	OPS/s = Total Operations / Time Period (seconds)
	Global Success Rate (non-5xx responses)		Global Success Rate (Non-5xx Responses) = (Total Requests - 5xx Responses) / Total Requests * 100
	4xxs	4xxs represent client errors, indicating potential issues with the request (e.g., 404 indicates not found).	ops/s = Number of 4xx responses received per second
	5xxs	5xxs represent server errors, indicating that the server failed to handle valid requests correctly (e.g., 500 indicates internal server error).	ops/s = Number of 5xx responses received per second
	Virtual Services	Measures the net growth of virtual services within the service mesh over a specific time range. By comparing the number of addition and deletion events, it is possible to understand the dynamic changes in services, which helps identify potential issues.	max(Virtual Services && and) - max(Virtual Services && delete)
	Destination Rules	Measures the net growth of destination rules within the service mesh over a specific time range. By comparing the number of addition and deletion events, it is possible to understand the dynamic changes in services, which helps identify potential issues.	max(Destination Rule && and) - max(Destination Rule && delete)
	Gateways	Measures the net growth of gateways within the service mesh over a specific time range. By comparing the number of addition and deletion events, it is possible to understand the dynamic changes in services, which helps identify potential issues.	max(Gateway && and) - max(Gateway && delete)
	Workload Entries	Measures the growth trend of WorkloadEntry over a specific time range. By observing changes in workload, it is possible to quickly understand the dynamic changes in services, which helps identify potential issues.	max(WorkloadEntry && and) - max(WorkloadEntry && delete)
	Service Entries	Measures the growth trend of Service Entries over a specific time range. By observing changes in Service Entries, it is possible to quickly understand the dynamic changes in services, which helps identify potential issues.	max(Service Entries && and) - max(Service Entries && delete)
	PeerAuthentication Policies	Measures the growth trend of PeerAuthentication Policies over a specific time range. By observing changes in PeerAuthentication Policies, it is possible to quickly understand the dynamic changes in services, which helps identify potential issues.	max(PeerAuthentication Policies && and) - max(PeerAuthentication Policies && delete)
	RequestAuthentication Policies	Measures the growth trend of RequestAuthentication Policies over a specific time range. By observing changes in RequestAuthentication Policies, it is possible to quickly understand the dynamic changes in services, which helps identify potential issues.	max(RequestAuthentication Policies && and) - max(RequestAuthentication Policies && delete)
	Authorization Policies	Measures the growth trend of Authorization Policies over a specific time range. By observing changes in Authorization Policies, it is possible to quickly understand the dynamic changes in services, which helps identify potential issues.	max(Authorization Policies && and) - max(Authorization Policies && delete)
HTTP/GRPC Workloads	Service	The name of the service, which combines the namespace where the service resides with the Kubernetes Service access information.
	Requests	The number of requests, representing the total number of requests within a specific time period.	sum(rate(istio_requests_total{...}[1m]))
	P50 Latency	The median latency, where 50% of requests are completed below this time.	histogram_quantile(0.50, sum(rate(...))
	P90 Latency	90% of requests are completed below this time.	histogram_quantile(0.90, sum(rate(...))
	P99 Latency	99% of requests are completed below this time.	histogram_quantile(0.99, sum(rate(...))
	Success Rate	The success rate, indicating the percentage of successful responses (response status code not equal to 5xx) within the query time range.	sum(rate(... response_code!~"5.*")) / sum(rate(...))
TCP Workloads	Service	The name of the service, obtained from the destination_service label in the provided Istio metrics, which includes the service's name.	destination_workload
	Workload	The name of the workload, obtained from the destination_workload label in the provided Istio metrics, which includes the workload's name.	destination_service
	Bytes Sent	The number of bytes sent per second.	Calculated by accumulating TCP bytes using istio_tcp_sent_bytes_total and using rate to calculate the corresponding sending rate.
	Bytes Received	The number of bytes received per second.	Calculated by accumulating TCP bytes using istio_tcp_received_bytes_total and using rate to calculate the corresponding receiving rate.
Istio Component-Based Version		Visualization of Istio component build versions, showing the distribution of versions for each component and their deployment in different clusters. This is useful for understanding the health and consistency of Istio deployments.	sum(istio_build{mesh_id="$mesh"}) by (component, tag, mesh_cluster)

Performance Monitor¶

Performance Monitoring

Category	Parameter s	Description	Calculation Method
VCPU Usage	vCPU / 1k rps	Displays the virtual CPU (vCPU) resources consumed per thousand requests (1k rps) in Istio, primarily querying istio-ingressgateway and istio-proxy. To ensure query efficiency, Istio only performs istio-proxy statistics when the number of istio-ingressgateway requests is greater than 10.	(sum(irate(container_cpu_usage_seconds_total{namespace!="istio-system",container="istio-proxy"}[1m]))/ (round(sum(irate(istio_requests_total[1m])), 0.001)/1000))/ (sum(irate(istio_requests_total{source_workload="istio-ingressgateway"}[1m])) >bool 10)
	vCPU	Displays the overall usage of virtual CPUs (vCPUs) in Istio.
Memory and Data	Memory Usage	Displays the memory usage of Istio system components, measured in bytes.	sum(container_memory_working_set_bytes{pod=~"istio-ingressgateway-."}) / count(container_memory_working_set_bytes{pod=~"istio-ingressgateway-.",container!="POD"})
	Bytes transferred / sec	Displays the number of bytes transmitted per second by Istio system components, measured in Bps.	sum(irate(istio_response_bytes_sum{source_workload="istio-ingressgateway", reporter=~"\(reporter",destination_mesh_id="\)mesh"}[1m]))
Istio Component-Based Version		Visualization of Istio component build versions, showing the distribution of versions for each component and their deployment in different clusters. This is useful for understanding the health and consistency of Istio deployments.	sum(istio_build{mesh_id="$mesh"}) by (component, tag, mesh_cluster)
Sidecar Resource Usage	Memory	Displays the changes in the total working set bytes per minute of the sidecar container "istio-proxy," used to monitor the memory usage of Istio proxy containers.	sum(container_memory_working_set_bytes{container="istio-proxy"})
	vCPU	Displays statistics of virtual CPU (vCPU) usage by the sidecar proxy, showing the rate sum of CPU usage in seconds for the container "istio-proxy."	sum(rate(container_cpu_usage_seconds_total{container="istio-proxy"}[1m]))
	Disk	Displays statistics of disk usage by the sidecar proxy, showing the sum of file system usage bytes for the container "istio-proxy."	sum(container_fs_usage_bytes{container="istio-proxy"})
Istiod Resource Usage	Memory	Displays the memory usage of the Istiod service, providing a comprehensive view: - Total: The total memory usage of the Istiod service in Kubernetes. - Container Memory: The memory usage of each container in the Istiod service in Kubernetes, including different types of memory usage such as virtual memory, resident memory, heap memory, and stack memory.	Total (k8s): "sum(container_memory_working_set_bytes{container=~\"discovery\|istio-proxy\", pod=~\"istiod-.\"})" Container ({{ container }}) (k8s): "container_memory_working_set_bytes{container=~\"discovery\|istio-proxy\", pod=~\"istiod-.\"}"
	vCPU	Displays the virtual CPU (vCPU) usage of the Istiod service, providing a comprehensive view: - Total: Displays the overall CPU usage of the Istiod service in Kubernetes. - Container CPU Usage: Displays the CPU usage of each container in the Istiod service in Kubernetes. - Pilot: Displays the CPU usage of the pilot component in Istiod.	Total (k8s): "sum(rate(container_cpu_usage_seconds_total{container=~\"discovery\|istio-proxy\", pod=~\"istiod-.\"}[1m]))" Container ({{ container }}) (k8s): "sum(rate(container_cpu_usage_seconds_total{container=~\"discovery\|istio-proxy\", pod=~\"istiod-.\"}[1m])) by (container)" Pilot: "irate(process_cpu_seconds_total{app="istiod"}[1m])"
	Disk	Displays the disk usage of Istio components in each cluster, particularly the file system usage related to the discovery and istio-proxy containers.	sum(process_open_fds{mesh_id="$mesh",app="istiod"}) by (mesh_cluster) container_fs_usage_bytes{ container=~"discovery\|istio-proxy", pod=~"istiod-.*"}
	Goroutines	Displays the trend of the number of Go routines in each cluster for Istio components.	sum(go_goroutines{mesh_id="$mesh", app="istiod"}) by (mesh_cluster)

Service Monitor¶

Service Monitoring

Category	Parameter s	Description	Calculation Method
General	Client Request Volume	Displays the number of client requests for the current service every 5 minutes. If the result is empty, it is displayed as N/A. When the threshold exceeds 80%, it is displayed in red.	round(sum(irate(istio_requests_total{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_service=}"$service"}[5m])), 0.001)
	Client Success Rate (non-5xx responses)	Displays the success rate of client requests for the current service every 5 minutes. It provides a visual way to quickly identify potential issues or trends.	sum(irate(istio_requests_total{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_service=}"\(service",response_code!~"5.*"}[5m])) / sum(irate(istio_requests_total{reporter=~"\)reporter",destination_mesh_id="\(mesh",destination_service=~"\)service"}[5m]))
	Client Request Duration	Displays the request duration for client requests of the current service. Three targets are defined to calculate the durations of P50, P90, and P99. The expressions calculate the 50^th, 90^th, and 99^th percentiles of duration.	Example: (histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=_{\"\(reporter\",destination_mesh_id=\"\)mesh\",destination_service=}\"\(service\"}[1m])) by (le)) / 1000) or histogram_quantile(0.50, sum(irate(istio_request_duration_seconds_bucket{reporter=~\"\)reporter\",destination_mesh_id=\"\(mesh\",destination_service=~\"\)service\"}[1m])) by (le))
	TCP Received Bytes	Displays the instantaneous rate of TCP received bytes for the current service within 1 minute. If "null" is matched, the result text is "N/A". Unit: Bps	"sum(irate(istio_tcp_received_bytes_total{reporter=~\"\(reporter\",destination_mesh_id=\"\)mesh\", destination_service=~\"$service\"}[1m]))"
	Server Request Volume	Displays the request volume for the current service and shows the trend over time. If "null" is matched, the result text is "N/A". Unit: Ops	round(sum(irate(istio_requests_total{reporter="destination", destination_mesh_id="\(mesh",destination_service=~"\)service"}[5m])), 0.001)
	Server Success Rate (non-5xx responses)	Displays the success rate of non-5xx responses for the current service. The threshold settings mark success rates below 95% as red, below 99% as orange, and 100% as green (percentage with 2 decimal places).	sum(irate(istio_requests_total{reporter="destination", destination_mesh_id="\(mesh",destination_service=~"\)service",response_code!~"5.*"}[5m])) / sum(irate(istio_requests_total{reporter="destination", destination_mesh_id="\(mesh", destination_mesh_id="\)mesh",destination_service=~"$service"}[5m]))
	Server Request Duration	Displays the request duration for server requests of the current service. It provides an in-depth understanding of service performance by calculating durations at different percentiles. The three target expressions calculate the median, 90^th, and 99^th percentile request durations to provide a performance overview from median to high-end.	- histogram_quantile - istio_request_duration_milliseconds_bucket - istio_request_duration_seconds_bucket
	TCP Sent Bytes	Displays the instantaneous rate of TCP sent bytes for the current service within 1 minute. If "null" is matched, the result text is "N/A". Unit: Bps
Client Workload	Incoming Requests By Source And Response Code	Displays incoming requests classified by source workload and response code. It visualizes the interactions between various workloads andtheir response codes. This helps identify any issues or abnormal patterns in the client workload.	sum(irate(istio_requests_total{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_service=}"$service"}[5m])) by (source_workload, response_code)
	Outgoing Requests By Destination And Response Code	Displays outgoing requests classified by destination workload and response code. It provides insights into the interactions between the current service and other services it communicates with.	sum(irate(istio_requests_total{reporter="destination", source_mesh_id="\(mesh",source_service=~"\)service"}[5m])) by (destination_workload, response_code)
	Incoming Request Volume By Source	Displays the volume of incoming requests classified by source workload. It helps understand the distribution of requests from different sources to the current service.	sum(irate(istio_requests_total{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_service=}"$service"}[5m])) by (source_workload)
	Outgoing Request Volume By Destination	Displays the volume of outgoing requests classified by destination workload. It provides insights into the distribution of requests from the current service to different destinations.	sum(irate(istio_requests_total{reporter="destination", source_mesh_id="\(mesh",source_service=~"\)service"}[5m])) by (destination_workload)
Application	Backend Errors	Displays the number of backend errors encountered by the current service. It helps identify any issues with the backend systems or dependencies.	sum(irate(istio_requests_total{reporter="destination", destination_mesh_id="\(mesh",destination_service=~"\)service",response_code=~"5.*"}[5m]))
	Backend Error Rate	Displays the rate of backend errors encountered by the current service. It provides insights into the stability and reliability of the backend systems or dependencies.	sum(irate(istio_requests_total{reporter="destination", destination_mesh_id="\(mesh",destination_service=~"\)service",response_code=~"5.*"}[5m])) / sum(irate(istio_requests_total{reporter="destination", destination_mesh_id="\(mesh",destination_service=~"\)service"}[5m]))
	Latency	Displays the latency of the current service for different percentiles (P50, P90, P99) of request durations. It helps analyze the overall performance and identify potential bottlenecks.	- histogram_quantile - istio_request_duration_milliseconds_bucket - istio_request_duration_seconds_bucket
	Request Volume	Displays the overall request volume for the current service. It helps understand the load and traffic patterns on the service.	round(sum(irate(istio_requests_total{reporter="destination", destination_mesh_id="\(mesh",destination_service=~"\)service"}[5m])), 0.001)

Workload Monitor¶

Workload Monitoring

Category	Parameter s	Description	Calculation Method
General	Incoming Request Volume	Displays the incoming request volume for the current workload in Ops. If the received data is empty (null), it will be displayed as "N/A".	Calculates the total number of incoming requests in the past 5 minutes, including specific destination workloads, namespaces, and clusters.
	Incoming Success Rate (non-5xx responses)	Displays the success rate of incoming requests (excluding 5xx responses) for the current workload every 5 minutes, providing a visual way to quickly identify potential issues or trends. If the success rate is below 95%, it will be displayed in red; if it's below 99%, it will be displayed in orange.	Uses two queries with denominators and numerators to calculate the percentage of non-5xx responses. The numerator calculates the non-5xx responses associated with a specific service, while the denominator calculates all requests associated with that service.
	Request Duration	Displays the request duration for the current workload, showing the request duration for P50, P90, and P99. This can be used to observe system performance and quickly identify potential bottlenecks or latency issues.	(histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_workload=}"\(workload", destination_cluster=~"\)dstcluster", destination_workload_namespace=~"\(namespace"}[1m])) by (le)) / 1000) or histogram_quantile(0.50, sum(irate(istio_request_duration_seconds_bucket{reporter=~"\)reporter",destination_mesh_id="\(mesh",destination_workload=~"\)workload", destination_cluster=~"\(dstcluster", destination_workload_namespace=~"\)namespace"}[1m])) by (le))
	TCP Server Traffic	Displays the TCP server traffic for the current workload, focusing on statistics as a target workload, namespace, and cluster. This helps understand network load and potential bottlenecks, showing the total number of bytes sent and received per second (Bps).	destination_workload_namespace=~"\(namespace", destination_workload=~"\)workload", destination_cluster=~"$dstcluster"
	TCP Client Traffic	Displays the TCP client traffic for the current workload, focusing on statistics as a source workload, namespace, and cluster. This helps understand network load and potential bottlenecks, showing the total number of bytes sent and received per second (Bps).	source_workload_namespace=~"\(namespace", source_workload=~"\)workload"
Inbound Workload	Incoming Requests By Source And Response Code	Displays incoming requests classified based on the source workload and response code for the current workload. It calculates the incoming request rate with and without mutual TLS connections, and categorizes them by source workload, source namespace, and response code.	Example: round(sum(irate(istio_requests_total{connection_security_policy!="mutual_tls", destination_workload_namespace=~"\(namespace", destination_workload=~"\)workload", destination_cluster=~"\(dstcluster", reporter=~"\)reporter",destination_mesh_id="\(mesh", source_workload=~"\)srcwl", source_workload_namespace=~"$srcns"}[5m])) by (source_workload, source_workload_namespace, response_code), 0.001)
	Incoming Success Rate (non-5xx responses) By Source	Displays the ratio of successful incoming requests (non-5xx responses) grouped by source workload and source namespace, classified by whether mutual TLS is used or not.	sum(irate(istio_requests_total{reporter=~"\(reporter",destination_mesh_id="\)mesh", connection_security_policy="mutual_tls", destination_workload_namespace=~"\(namespace", destination_workload=~"\)workload", destination_cluster=~"\(dstcluster",response_code!~"5.*", source_workload=~"\)srcwl", source_workload_namespace=~"\(srcns"}[5m])) by (source_workload, source_workload_namespace) / sum(irate(istio_requests_total{reporter=~=~"\)reporter",destination_mesh_id="\(mesh", destination_workload_namespace=~"\)namespace", destination_workload=~"\(workload", destination_cluster=~"\)dstcluster", source_workload=~"\(srcwl", source_workload_namespace=~"\)srcns"}[5m])) by (source_workload, source_workload_namespace)
	Request Duration By Source	Displays the request duration for incoming requests grouped by source workload and source namespace. This helps identify potential performance issues or latency differences among different sources.	(histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_workload_namespace=}"\(namespace",destination_workload=~"\)workload", destination_cluster=~"\(dstcluster", source_workload=~"\)srcwl", source_workload_namespace=_{"\(srcns"}[1m])) by (le)) / 1000) or histogram_quantile(0.50, sum(irate(istio_request_duration_seconds_bucket{reporter=~"\)reporter",destination_mesh_id="\(mesh",destination_workload_namespace=~"\)namespace",destination_workload=}"\(workload", destination_cluster=~"\)dstcluster", source_workload=~"\(srcwl", source_workload_namespace=~"\)srcns"}[1m])) by (le))
Outbound Workload	Outgoing Requests By Destination And Response Code	Displays outgoing requests classified based on the destination workload and response code for the current workload. It calculates the outgoing request rate with and without mutual TLS connections, and categorizes them by destination workload, destination namespace, and response code.	round(sum(irate(istio_requests_total{connection_security_policy!="mutual_tls", source_workload_namespace=~"\(namespace", source_workload=~"\)workload", reporter=~"\(reporter",source_mesh_id="\)mesh", destination_workload_namespace=~"\(dstns", destination_workload=~"\)dstwl", destination_cluster=~"$dstcluster"}[5m])) by (destination_workload, destination_workload_namespace, response_code), 0.001)
	Outgoing Success Rate (non-5xx responses) By Destination	Displays the ratio of successful outgoing requests (non-5xx responses) grouped by destination workload and destination namespace, classified by whether mutual TLS is used or not.	sum(irate(istio_requests_total{reporter=~"\(reporter",source_mesh_id="\)mesh", connection_security_policy="mutual_tls", source_workload_namespace=~"\(namespace", source_workload=~"\)workload", destination_cluster=~"\(dstcluster",response_code!~"5.*", destination_workload_namespace=~"\)dstns", destination_workload=~"\(dstwl"}[5m])) by (destination_workload, destination_workload_namespace) / sum(irate(istio_requests_total{reporter=~"\)reporter",source_mesh_id="\(mesh", destination_cluster=~"\)dstcluster", source_workload_namespace=~"\(namespace", source_workload=~"\)workload", destination_workload_namespace=~"\(dstns", destination_workload=~"\)dstwl"}[5m])) by (destination_workload, destination_workload_namespace)
	Request Duration By Destination	Displays the request duration for outgoing requests grouped by destination workload and destination namespace. This helps identify potential performance issues or latency differences among different destinations.	(histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=_{"\(reporter",source_workload_namespace=~"\)namespace",source_workload=}"\(workload", destination_cluster=~"\)dstcluster", destination_workload_namespace=~"\(dstns", destination_workload=~"\)dstwl"}[1m])) by (le)) / 1000) or histogram_quantile(0.50, sum(irate(istio_request_duration_seconds_bucket{reporter=_{"\(reporter",source_workload_namespace=~"\)namespace",source_workload=}"\(workload", destination_cluster=~"\)dstcluster", destination_workload_namespace=~"\(dstns", destination_workload=~"\)dstwl"}[1m])) by (le))
Service Dependencies	Incoming Requests By Destination Service And Response Code	Displays incoming requests from different source services based on the destination service and response code for the current workload. It calculates the incoming request rate with and without mutual TLS connections, and categorizes them by destination service and response code.	round(sum(irate(istio_requests_total{connection_security_policy!="mutual_tls", destination_service=_{"\(destsvc", reporter=~"\)reporter",destination_workload_namespace=}"\(namespace", destination_workload=~"\)workload", destination_cluster=~"\(dstcluster",destination_mesh_id="\)mesh", source_service=~"\(srcsvc", source_workload_namespace=~"\)srcns", source_workload=~"$srcwl"}[5m])) by (destination_service, response_code), 0.001)
	Incoming Success Rate (non-5xx responses) By Destination Service	Displays the ratio of successful incoming requests (non-5xx responses) grouped by destination service, classified by whether mutual TLS is used or not.	sum(irate(istio_requests_total{reporter=~"\(reporter",destination_mesh_id="\)mesh", connection_security_policy="mutual_tls", destination_workload_namespace=~"\(namespace", destination_workload=~"\)workload", destination_cluster=~"\(dstcluster",response_code!~"5.*", source_service=~"\)srcsvc", source_workload_namespace=~"\(srcns", source_workload=~"\)srcwl"}[5m])) by (destination_service) / sum(irate(istio_requests_total{reporter=~"\(reporter",destination_mesh_id="\)mesh", destination_cluster=~"\(dstcluster", destination_workload_namespace=~"\)namespace", destination_workload=~"\(workload", source_service=~"\)srcsvc", source_workload_namespace=~"\(srcns", source_workload=~"\)srcwl"}[5m])) by (destination_service)
	Request Duration By Destination Service	Displays the request duration for incoming requests grouped by destination service. This helps identify potential performance issues or latency differences among different destination services.	(histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_workload_namespace=}"\(namespace",destination_workload=~"\)workload", destination_cluster=~"\(dstcluster", source_service=~"\)srcsvc", source_workload_namespace=~"\(srcns", source_workload=~"\)srcwl"}[1m])) by (le)) / 1000) or histogram_quantile(0.50, sum(irate(istio_request_duration_seconds_bucket{reporter=_{"\(reporter",destination_mesh_id="\)mesh",destination_workload_namespace=}"\(namespace",destination_workload=~"\)workload", destination_cluster=~"\(dstcluster", source_service=~"\)srcsvc", source_workload_namespace=~"\(srcns", source_workload=~"\)srcwl"}[1m])) by (le))

Comments