What Is Service Telemetry?
Service telemetry provides network latency measurement and visualization services based on NoF+. It measures the latency for network, storage, and compute nodes in a storage input/output (I/O) operation by segment to monitor the network and demarcate problems.
Why Do We Need Service Telemetry?
As we enter the intelligence era, there are more and more services with massive data storage and read/write requirements. NoF+ services face the following challenges to O&M:
- The network cannot proactively detect service performance deterioration or fluctuation caused by problems such as congestion. Instead, network faults are usually reported by the service department.
- When the storage I/O latency or input/output operations per second (IOPS) deteriorates, it is difficult to locate the fault.
To address these challenges, Huawei launches the service telemetry technology. This technology breaks through limitations that network monitoring technologies such as telemetry cannot detect the I/O latency, provides network latency measurement and visualization services based on NoF+, and sends statistics to iMaster NCE-FabricInsight through telemetry for visualized display. It can be flexibly deployed to accurately monitor and analyze the I/O latency, and quickly detect storage service performance deterioration. This facilitates network problem identification and network quality optimization, which in turn spurs the development of intelligent lossless networks.
How Does Service Telemetry Work?
Service Architecture
The following figure shows the layers involved in the service process of service telemetry.
Working process of service telemetry
- Analysis presentation layer (iMaster NCE-FabricInsight): Displays I/O-based performance indicators of service traffic and delivers configurations to devices through NETCONF interfaces.
- Device measurement layer (switches):
- Compute-side port: Service packets enter or leave a measurement device through the compute-side port. The measurement device identifies specified packets, performs I/O latency measurement and breakdown, and reports the measurement result to the analyzer.
- Storage-side port: Service packets enter or leave a measurement device through the storage-side port. The measurement device identifies specified packets, performs I/O latency measurement and breakdown, and reports the measurement result to the analyzer.
Latency Breakdown Solution
Based on the I/O interaction process, service telemetry can be used to match specified packets in transmit and return directions, define I/O latency breakdown objects, and measure the I/O latency. The following figure shows the latency breakdown solution.
Packet interaction in read and write I/Os
- Data access latency (DAL): Used to locate problems on the storage side. DALs in read and write operations are measured separately.
- Data preparation latency (DPL): Used to locate problems on the compute side. The DPL is only involved in the write operation.
- I/O latency (IOL): Total latency on the compute/storage side.
- Network round-trip time (RTT): They are different in read and write operations. iMaster NCE-FabricInsight calculates the network RTT using the following formula: RTT = IOL1 – IOL2.
Typical Application Scenario of Service Telemetry
The following figure shows the typical application scenario of service telemetry. The service telemetry function can be enabled on switch ports. This function is deployed on the ports connecting to compute-side and storage-side servers and does not need to be deployed on the interconnection ports between switches.
Typical application scenario of service telemetry
The following table shows two modes commonly used in service application.
Routine Monitoring Mode |
Maintenance or Key Assurance Mode |
|
---|---|---|
Deployment position |
Single-point measurement (compute-side port) |
Multi-point coordinated measurement (compute-side and storage-side ports) |
Solution |
Single-point measurement + port-based polling The port-based polling solution is used to limit the number of packets sent to the CPU. |
Multi-point measurement + interesting flow The number of flows is reduced to limit the number of packets sent to the CPU. |
Service indicator |
|
|
Applicable scenario |
Full-flow monitoring (time division multiplexing by interface group, full-flow instead of full-packet) |
Full-process monitoring of interesting flows (full packets of interesting flows) |
- Author: Qian Jinchen, Yin Rongrong
- Updated on: 2024-06-14
- Views: 1083
- Average rating: