What Is IFIT?
IFIT is an Internet Engineering Task Force (IETF) standard measurement protocol proposed by Huawei. It marks real service packets by inserting IFIT headers, thereby directly measuring network performance indicators, such as delay, packet loss rate, and jitter. IFIT uses telemetry technology to report measurement data in real time and displays the results on the graphical user interface (GUI) of iMaster NCE-IP. IFIT is the industry's first complete in-band flow quality detection and fault demarcation solution.
In contrast with traditional network O&M technologies, IFIT features high precision, real-time performance, and visualization. It can flexibly adapt to multiple service scenarios and promotes intelligent O&M by working with the big data platform and intelligent algorithms.
Why Do We Need IFIT?
In the 5G and cloud era, the services and architecture of IP networks have changed tremendously. For one thing, the development of 5G has given rise to new services, such as HD video, virtual reality (VR), and Internet of Vehicles (IoV). For another, cloudification of network devices and services has become a common choice for facilitating unified management and reducing O&M costs. New services and architecture pose many challenges to bearer networks, including ultra-bandwidth, hyperconnectivity, low delay, and high reliability.
New challenges posed by new services and architecture
Traditional network O&M methods cannot offer the levels of reliability required by new services and architecture. The major problems are passive detection of service faults and inefficient fault demarcation and locating.
- Passive detection of service faults: In most cases, O&M personnel can determine the fault scope based on only complaints received from users or work orders dispatched by related service departments. This means that O&M personnel cannot perceive faults quickly and can only handle the faults passively, increasing the pressure on troubleshooting and potentially resulting in poor user experience.
- Inefficient fault demarcation and locating: Multiple teams often need to collaborate in order to demarcate and locate faults, and the lack of a clear demarcation mechanism between teams means that their responsibilities are not well defined. Troubleshooting is inefficient because devices must be manually checked one by one to identify the faulty device, which then needs to be restarted or have its traffic switched to another device. In addition, traditional Operations, Administration, and Maintenance (OAM) technologies simulate service flows by using test packets and therefore cannot precisely reproduce performance deterioration or fault scenarios from the live network.
Against this backdrop, Huawei proposed the IFIT protocol. IFIT is an in-band measurement technology (it marks real service packets or inserts measurement information into real service packets) and implements in-band flow measurement by inserting IFIT headers into real service packets. Unlike out-of-band measurement technologies (such as TWAMP) that indirectly measure network performance by simulating service data packets and periodically sending them, IFIT can reflect the actual network performance indicators such as delay, packet loss, and jitter in real time and proactively detect service faults. Moreover, IFIT outperforms existing in-band measurement technologies (such as IP FPM and IOAM) in terms of service deployment complexity, forwarding plane efficiency, and protocol extensibility, among others.
Comparison between different in-band measurement technologies
Furthermore, IFIT can work with big data analytics and intelligent algorithms to build an intelligent O&M system, promoting intelligent O&M in the IPv6+ era. In addition, IFIT empowers networks with predictive analysis and self-healing capabilities, providing assurance for network automation and intelligence.
What Are the Advantages of IFIT?
The following describes the advantages of IFIT in terms of measurement data, service scenarios, GUI, and intelligent O&M.
High-Precision and Multi-Dimensional Quality Measurement of Real Services
The forwarding paths of test packets in traditional OAM technologies may differ from those of real service packets. IFIT provides in-band flow measurement capabilities using real service packets. The measurement data can precisely reflect the quality of real services from multiple dimensions. The details are as follows:
- IFIT can restore the actual forwarding path of packets and work with telemetry, which supports data collection within seconds, to implement real-time monitoring of network SLAs. The precision of packet loss measurement can reach 10-6, and that of delay measurement can reach microsecond-level. IFIT can detect all silent faults and locate them within seconds. Silent faults are faults that affect service experience but do not reach the alarm triggering threshold and cannot be effectively located. Such faults cause significant damage on live networks, accounting for just 15% of all faults yet more than 80% of the time spent on O&M. IFIT can identify minor exceptions on the network and detect the loss of even one packet. Such precise packet loss detection meets the requirements of "zero-packet-loss" services such as financial final accounting, telemedicine, industrial control, and power differential protection, ensuring high reliability for such services.
- In addition to accurately measuring the delay and packet loss rate of each service, IFIT can use extension headers to collect various performance data such as per-packet and out-of-order statistics. In this case, users can monitor the network running quality from multiple dimensions, thereby gaining actionable insight into the overall network status.
IFIT measurement based on real service flows
Flexible Adaptation to Large-Scale and Multi-Type Service Scenarios
Network development is usually a lengthy process, meaning that multiple types of devices may coexist on a network and carry various types of services as requirements continue to grow and evolve. IFIT is easy to deploy and can flexibly adapt to large-scale and multi-type service scenarios, specifically:
- IFIT supports one-click delivery of network-wide configuration. E2E and hop-by-hop measurement only need to be configured on the ingress as required, with IFIT simply enabled on transit and egress nodes. In this way, IFIT is applicable to large networks with many devices.
- IFIT flows include static flows (manually configured) and dynamic flows (automatically learned or triggered by traffic with IFIT headers). IFIT flows can be specific flows created based on unique information (such as 5-tuple), tunnel-level aggregation flows, or VPN-level aggregation flows. This allows IFIT to measure both specific service flows and E2E private line traffic at different granularities.
- IFIT has good compatibility with existing networks and can be applied to networks with various types of devices. Devices that do not support IFIT can transparently transmit IFIT flows, avoiding the potential problems involved in interconnection with third-party devices.
- IFIT can automatically learn actual forwarding paths without needing to detect the paths in advance. This eliminates the need to configure the forwarding path in advance on each NE along the path, thereby reducing the planning and configuration workload.
- IFIT applies to various types of networks, such as Layer 2 and Layer 3 networks, and multiple types of tunnels, meeting diverse requirements on the networks.
IFIT adapting to multiple application scenarios
Visualized O&M GUI
Without visualized O&M, network O&M personnel need to manually configure devices one by one, and then multiple departments need to cooperate to check each item. This leads to low O&M efficiency. Such efficiency can be significantly enhanced through visualized O&M. It not only provides centralized management and control capabilities, but also supports online service planning and one-click deployment. Furthermore, it supports quick fault demarcation and locating through SLA visualization. IFIT provides visualized O&M capabilities. Users can deliver different IFIT monitoring policies as required through the GUI of iMaster NCE-IP to implement routine O&M and quick troubleshooting. The details are as follows:
- Routine O&M: O&M personnel can routinely monitor base station status statistics, the network fault statistics, the abnormal base station statistics, and the top 5 faults affecting base stations at the network and area level. This enables O&M personnel to promptly learn about top faults on the entire network and in key areas and the statistics on base station service status through performance reports. In VPN scenarios, detailed information about E2E service flows is provided to help O&M personnel proactively identify and locate faults and ensure the overall SLA of private line services.
- Quick troubleshooting: Upon receiving a fault report, O&M personnel can search for the base station name or IP address to view the service topology and IFIT hop-by-hop flow indicators, and rectify the fault based on the fault location, possible causes, and rectification suggestions. In addition, O&M personnel can view information about topology paths and historical fault locating information collected over the past seven days.
iMaster NCE-IP GUI
As shown in the figure, the IFIT monitoring results can be visually and graphically displayed on the GUI of iMaster NCE-IP. This helps users learn about the network status and quickly detect and rectify faults, achieving superior O&M experience.
Closed-Loop Intelligent O&M System
The evolution of the network architecture and services poses new challenges to bearer networks. In particular, traditional O&M methods need to be improved in order to achieve E2E high-quality network experience. To this end, passive O&M needs to be replaced with proactive O&M, and an intelligent O&M system needs to be built. Such a system can proactively detect exceptions in real services, automatically demarcate faults, and implement fast fault locating and self-healing, to name a few. This helps create automated processes that can adapt to complex and changing network environments. IFIT can work with telemetry, big data analytics, and intelligent algorithms to build an intelligent O&M system.
Building a closed-loop intelligent O&M system based on IFIT
As shown in the figure above, IFIT can automatically switch between E2E and hop-by-hop (trace) measurement according to the network quality. The measurement results sent by IFIT are data sources for the big data platform and intelligent algorithm analysis. Based on these results, the intelligent O&M system is able to implement precise fault demarcation and locating as well as fast fault self-healing. In addition to providing IFIT for in-band flow measurement and telemetry for high-speed statistics collection, the big data platform enables queries to be performed within seconds and massive IFIT data to be efficiently processed. Furthermore, efficient and reliable data analysis and conversion are ensured if a single node fails, as the failure does not cause data loss. The intelligent algorithm can cluster poor-QoE events as mass network faults. That is, the algorithm calculates the path similarity of poor-QoE service flows in the same period and considers flows that reach the algorithm threshold to be caused by the same fault. This enables the common failure point to be located with an accuracy of more than 90%, improving the O&M efficiency and reducing the service interruption time. The combination of the preceding four technologies ensures the intelligent O&M system is a closed-loop one. Furthermore, it promotes the optimization of intelligent O&M solutions, enabling them to adapt to future network evolution.
How Does IFIT Work?
The following describes the fundamentals of IFIT to illustrate how the preceding advantages are implemented.
How Does IFIT Accurately Locate Faults?
The following uses the IFIT over SRv6 scenario as an example. An IFIT header is encapsulated into a Segment Routing Header (SRH) and includes: Flow Instruction Indicator (FII), which identifies the beginning of an IFIT header and defines its overall length, Flow Instruction Header (FIH), which uniquely identifies a service flow, and Flow Instruction Extension Header (FIEH), which defines an extended function.
Structure of the IFIT header
The L and D fields in the FIH provide the packet loss and delay measurement capabilities, respectively, based on alternating coloring. Coloring refers to marking packets for specific measurement, which IFIT does by setting the packet loss coloring bit L or delay coloring bit D to 0 or 1. By coloring real service packets and leveraging time synchronization protocols such as 1588v2, IFIT can proactively detect minor network changes and reflect the actual packet loss and delay on the network.
In addition, the E field in the FIEH can define two IFIT measurement modes: E2E and hop-by-hop (trace). The E2E mode applies to scenarios where E2E overall service quality monitoring is required, whereas the trace mode applies to scenarios where hop-by-hop demarcation is required for low-quality services or on-demand hop-by-hop monitoring is required for VIP services. Where they differ is whether IFIT needs to be enabled on all IFIT-capable nodes along the service flow path.
E2E and trace measurement modes
In most cases, E2E IFIT and trace IFIT are used together. When the E2E IFIT measurement data reaches the threshold, trace IFIT is automatically triggered. In this case, the actual service flow forwarding path can be restored, and faults can be quickly demarcated and located.
How Does IFIT Send Data in Real Time?
In an intelligent O&M system, IFIT often uses telemetry to send measurement data to iMaster NCE-IP in real time for analysis. Telemetry is a technology that remotely and quickly collects data from physical or virtual devices. Devices periodically send information (such as interface traffic statistics, CPU usage, and memory usage) to collectors in push mode. This mode collects data faster than the conventional pull mode (question-answer interaction). Telemetry flexibly collects data by subscribing to different sampling paths. This allows IFIT to manage more devices and obtain measurement data with higher precision, providing big data to enable fast locating of network faults and the optimization of network quality.
As shown in the following figure, a user subscribes to the data source of a device through iMaster NCE-IP. The device collects measurement data based on the configuration requirements. It then encapsulates the data, such as the flow ID, flow direction, error information, and timestamp, into telemetry packets for reporting. iMaster NCE-IP receives and stores measurement data and displays analysis results on its GUI.
Reporting IFIT measurement data through telemetry
Working with telemetry's high-speed data collection technology, which can collect data within seconds, IFIT sends measurement data to iMaster NCE-IP in real time to implement efficient performance measurement.
What Are the Application Scenarios of IFIT?
This section describes the application of IFIT in the following scenarios: Internet Protocol Radio Access Network (IP RAN) mobile bearer network, intelligent cloud-network private line service, and one financial WAN, demonstrating the strong practicability of IFIT.
IP RAN Mobile Bearer Network
The IP RAN mobile bearer network is a large-scale network that has various access modes and carries various mobile bearer services (such as HD video) that pose higher requirements on link connectivity and performance indicators. For this, Huawei proposes the E2E Enhanced Stream Quality Monitoring (ESQM) + trace IFIT hybrid measurement solution. ESQM is a measurement technology that collects statistics on TCP, SCTP, or GTP packets based on 5-tuple information. In this solution, E2E ESQM is performed first. Hop-by-hop IFIT is triggered when the performance indicator of a base station flow exceeds the specified threshold. iMaster NCE-IP then summarizes the reported hop-by-hop measurement data for path restoration and fault locating.
Application of IFIT on an IP RAN mobile bearer network
This solution monitors detailed indicator data of service flows from different dimensions, such as base station flows, data flows, and signaling flows. Based on the real-time performance data of base stations across the entire network, a big data-based intelligent O&M system can be constructed to implement high-precision and service-level SLA awareness in real time and multi-dimensional visualization for base station services. The system can also analyze and evaluate potential network risks, as well as adjust and optimize network resources to implement automatic and intelligent O&M.
Intelligent Cloud-Network Private Line Service
The intelligent cloud-network private line service is an important part of the intelligent cloud-network technology. It leverages the wide coverage of the mobile bearer network to provide enterprise private line services more conveniently and improves the network deployment, operations, and O&M efficiency through E2E collaborative management. IFIT provides VPN service analysis and assurance for intelligent cloud-network private line services, including site-to-site private line, site-to-cloud private line, and cloud-network interconnection scenarios. The following uses the site-to-cloud private line as an example to describe the E2E IFIT + trace IFIT solution, in which E2E IFIT is performed first. Hop-by-hop IFIT is triggered when the performance indicator of a VPN flow exceeds the specified threshold. iMaster NCE-IP then summarizes the reported hop-by-hop measurement data for path restoration and fault locating.
Application of IFIT in the intelligent cloud-network private line service
This solution supports the query of VPN service flow performance indicators by granularity ranging from minute to year and the query of overall VPN service information based on the VPN name, VPN type, and service status. In this way, the solution implements E2E multi-dimensional exception identification, network health visualization, intelligent fault diagnosis, and fault self-healing in a closed-loop manner.
One Financial WAN
One financial WAN uses SRv6 technology to quickly and easily establish basic network connections between the cloud and various access points, ensuring efficient service provisioning. The financial industry itself has high requirements on SLA assurance, and one financial WAN faces higher requirements on O&M capabilities due to the diverse array of outlet service types brought about by the development of banking services. For example, in addition to traditional production and office services, other services such as security protection, IoT, and public cloud services are now prevalent. Against this backdrop, Huawei proposes the IFIT tunnel-level measurement solution.
Application of IFIT on one financial WAN
This solution supports IFIT tunnel-level measurement in SRv6 scenarios. The link currently in use is periodically compared with the optimal link for path selection and optimization, implementing intelligent traffic steering. In addition, one core controller is deployed to perform centralized O&M on the entire financial network and implement E2E management and scheduling.
- Author： Chen Jingyi
- Updated on： 2023-03-13
- Views： 6481
- Average rating：