Search
Home Search Center IP Encyclopedia Online Courses Intelligent Model Selection

What Is Service Telemetry?

Service telemetry provides network latency measurement and visualization services based on NoF+. It measures the latency for network, storage, and compute nodes in a storage input/output (I/O) operation by segment to monitor the network and demarcate problems.

Why Do We Need Service Telemetry?

As we enter the intelligence era, there are more and more services with massive data storage and read/write requirements. NoF+ services face the following challenges to O&M:

  1. The network cannot proactively detect service performance deterioration or fluctuation caused by problems such as congestion. Instead, network faults are usually reported by the service department.
  2. When the storage I/O latency or input/output operations per second (IOPS) deteriorates, it is difficult to locate the fault.

To address these challenges, Huawei launches the service telemetry technology. This technology breaks through limitations that network monitoring technologies such as telemetry cannot detect the I/O latency, provides network latency measurement and visualization services based on NoF+, and sends statistics to iMaster NCE-FabricInsight through telemetry for visualized display. It can be flexibly deployed to accurately monitor and analyze the I/O latency, and quickly detect storage service performance deterioration. This facilitates network problem identification and network quality optimization, which in turn spurs the development of intelligent lossless networks.

How Does Service Telemetry Work?

Service Architecture

The following figure shows the layers involved in the service process of service telemetry.

Working process of service telemetry
Working process of service telemetry
  1. Analysis presentation layer (iMaster NCE-FabricInsight): Displays I/O-based performance indicators of service traffic and delivers configurations to devices through NETCONF interfaces.
  2. Device measurement layer (switches):
    • Compute-side port: Service packets enter or leave a measurement device through the compute-side port. The measurement device identifies specified packets, performs I/O latency measurement and breakdown, and reports the measurement result to the analyzer.
    • Storage-side port: Service packets enter or leave a measurement device through the storage-side port. The measurement device identifies specified packets, performs I/O latency measurement and breakdown, and reports the measurement result to the analyzer.

Latency Breakdown Solution

Based on the I/O interaction process, service telemetry can be used to match specified packets in transmit and return directions, define I/O latency breakdown objects, and measure the I/O latency. The following figure shows the latency breakdown solution.

Packet interaction in read and write I/Os
Packet interaction in read and write I/Os
In the preceding figure:
  • Data access latency (DAL): Used to locate problems on the storage side. DALs in read and write operations are measured separately.
  • Data preparation latency (DPL): Used to locate problems on the compute side. The DPL is only involved in the write operation.
  • I/O latency (IOL): Total latency on the compute/storage side.
  • Network round-trip time (RTT): They are different in read and write operations. iMaster NCE-FabricInsight calculates the network RTT using the following formula: RTT = IOL1 – IOL2.

Typical Application Scenario of Service Telemetry

The following figure shows the typical application scenario of service telemetry. The service telemetry function can be enabled on switch ports. This function is deployed on the ports connecting to compute-side and storage-side servers and does not need to be deployed on the interconnection ports between switches.

Typical application scenario of service telemetry
Typical application scenario of service telemetry

The following table shows two modes commonly used in service application.

  

Routine Monitoring Mode

Maintenance or Key Assurance Mode

Deployment position

Single-point measurement (compute-side port)

Multi-point coordinated measurement (compute-side and storage-side ports)

Solution

Single-point measurement + port-based polling

The port-based polling solution is used to limit the number of packets sent to the CPU.

Multi-point measurement + interesting flow

The number of flows is reduced to limit the number of packets sent to the CPU.

Service indicator

  • Network RTT measurement: not supported
  • IOL, DPL, and DAL measurement on the compute leaf node: The DAL refers to the processing latency on the storage side and the network latency. If the DAL is faulty, it is suspected that an error occurs on the storage side.
  • Network RTT measurement: supported
  • IOL and DPL measurement on the compute leaf node and DAL measurement on the storage leaf node: The measurement result is more accurate.

Applicable scenario

Full-flow monitoring (time division multiplexing by interface group, full-flow instead of full-packet)

Full-process monitoring of interesting flows (full packets of interesting flows)

About This Topic
  • Author: Qian Jinchen, Yin Rongrong
  • Updated on: 2024-06-14
  • Views: 1083
  • Average rating:
Share link to