What Is xFlow?
xFlow provides intelligent analysis of network-wide flows and integrated O&M of applications and networks. Huawei xFlow intelligent full-flow analysis solution implements integrated application and network monitoring, one-click diagnosis of application faults, and proactive application experience assurance. It helps solve problems such as scattered data, high costs, and high performance consumption in traditional solutions.
Why Do We Need xFlow?
O&M Challenges to Traditional Data Center Networks
- Cross-organization O&M and manual collaboration: Services and networks are independently operated and maintained. It takes three days on average for multiple departments to jointly locate faults.
- Scattered O&M data and siloed systems: Network data is scattered in multiple systems, which cannot meet the requirements of integrated O&M and analysis.
- Massive and diversified data: An enterprise generates about 1 PB data every day, making it difficult to backtrack services.
Limitations on NPM-based Network Performance Analysis
Network Performance Management (NPM), as defined by Gartner, is a monitoring tool based on data sources, including network device health status and events, traffic data generated by network devices, and original network data packets. It measures, diagnoses, and optimizes the network service quality for user experience, and provides historical, real-time, and predictive views, gaining an in-depth insight into the availability and performance of networks and applications running on the networks.
In the data center monitoring field, the NPM-based data analysis solution mainly analyzes network performance and is key for locating application and network faults. It can analyze data packets to gain insights into the availability and performance of networks and applications. The NPM solution aims to solve the problems of independent and time-consuming analysis of traditional service and network data, scattered O&M data, and difficult service backtracking due to a large amount of data.
- Management dead zone in the fabric: After SDN cloudification, east-west traffic accounts for about 71.5% of the total traffic in a data center, and a large number of interactions between applications occur in the fabric. However, NPM monitors only north-south traffic and traffic of border devices. Hop-by-hop traffic forwarding within the fabric cannot be detected, and the specific node where a fault occurs cannot be located.
- Unable to quickly demarcate faults on the cloud and network: Due to frequent application faults, the traffic of all leaf nodes in a fabric needs to be mirrored to NPM for analysis. As a result, the deployment cost of the traditional NPM solution is too high, and the cloud O&M team cannot demarcate the faults.
- Lacking correlation analysis of network device impacts on services: Generally, exceptions are identified based on service flows. NPM cannot determine whether services are affected due to network device exceptions.
Huawei xFlow Intelligent Full-Flow Analysis Solution
To adapt to the new trend of SDN cloudification, Huawei iMaster NCE-FabricInsight V100R023C00 introduces the xFlow intelligent full-flow analysis feature to implement intelligent analysis of network-wide flows and integrated O&M of applications and networks. The xFlow solution has the following major advantages over the NPM solution:
1. Network monitoring: The NPM solution cannot detect and demarcate faults on cloud-based networks caused by intra-fabric interactions. In comparison, the xFlow solution supports hop-by-hop fault demarcation in the fabric, minute-level diagnosis of devices, networks, and applications, and full log aggregation and source tracing. With these metrics, it can detect more than 90 typical network faults in one minute and locate them in three minutes.
2. Deployment cost: The xFlow solution supports on-demand full-flow analysis in the fabric without deploying cables, reducing the deployment of TAP networks and connections as well as deployment costs of the leaf network.
3. Correlation analysis of network device impacts on services: The xFlow solution supports monitoring of network devices and service application flows, and provides network risk evaluation, fault analysis, and log aggregation and source tracing. In addition, the xFlow solution provides integrated analysis of services and networks to proactively ensure service experience. It can also collaborate with applications to quickly demarcate faults and handle potential risks.
xFlow Mirroring Network Topology
- Mirroring mode (how to send flows):
- Port mirroring: Port/VLAN/Local flow mirroring can be configured for full flows. Typically, port mirroring is used.
- ERSPAN: ERSPAN remote flow mirroring can be configured for specified flows and on-demand full flows.
- Forwarding network (how to forward flows):
- Out-of-band network: In port/VLAN/local flow mirroring, the observing port needs to be directly connected to the TAP network, occupying extra physical ports.
- In-band forwarding: In remote flow mirroring, switches forward all mirrored traffic to the collector, which will not occupy extra physical ports.
- Collection plane (how to receive flows):
- xFlow probe: If the probe is directly connected to the observing port, it receives packets through the direct link. If the probe is directly connected to the TAP network, it forwards and receives packets based on TAP rules.
- FabricInsight collector: The southbound plane IP address of the collector cluster is configured as the ERSPAN destination IP address. Static routes can be configured on the switch connected to the collector and then be advertised to the underlay network.
Key Technologies of xFlow
xFlow Full-Flow Mirroring
- Mirroring mode (how to send flows): Port/VLAN/Local flow mirroring can be configured for full flows. Typically, port mirroring is used.
- Forwarding network (how to forward flows):
- In port/VLAN/local flow mirroring, the observing port needs to be directly connected to the xFlow probe, occupying extra physical ports. This applies to small-scale mirroring scenarios.
- In port/VLAN/local flow mirroring, the observing port needs to be directly connected to the TAP network, occupying extra physical ports. This applies to medium- and large-scale mirroring scenarios.
- Collection plane (how to receive flows):
- The xFlow probe receives packets through direct links. Both IP and VXLAN packets can be parsed.
- The xFlow probe receives packets through the directly connected TAP network. Both IP and VXLAN packets can be parsed.
xFlow Specified Flow Mirroring
- Packet forwarding path information (such as IP access location, each forwarding device, and inbound and outbound interfaces)
- Start time, end time, and transmitted bytes of a TCP session
- Session exceptions: connection setup failures (such as TCP retransmission and abnormal reset), forwarding delay threshold exceeded, and TTL < 3 (suspected loop)
xFlow On-demand Flow Mirroring
- Enter the faulty IP address pair (source and destination IP addresses). The system displays the IP access location and flow path.
- Select devices for obtaining packets on demand to narrow down the fault domain. The system automatically delivers configurations and obtains packets in real time without deploying cables.
- The system analyzes packets one by one in real time and provides the fault demarcation conclusion and packet evidence.
Typical Application Scenarios of the xFlow Solution
xFlow can be applied in the following key solutions.
Network-wide Performance Monitoring
A bank uses NPM to monitor 70+ core service systems. However, they cannot view the status and performance of network devices, and need to log in to multiple systems for device monitoring. Due to separated monitoring systems, lack of a global perspective, and isolated O&M data, NMS cannot clearly detect application experience and device exceptions when monitoring device metrics. In addition, NPM cannot predict the network sub-health because it lacks systematic prevention methods for proactive troubleshooting.
The xFlow solution implements integrated application and network monitoring, visualizes 140+ performance metrics of applications and networks in an end-to-end manner, and proactively evaluates key application experience and network-wide quality. It can also predict 40+ key network risks, systematically predict potential network risks, and construct the proactive prevention system, proactively assuring optimal application experience.
Fault Diagnosis and Demarcation
- xFlow full-flow mirroring is deployed at key network locations to analyze all packets and identify session exceptions, implementing segment-based fault demarcation.
- xFlow specified flow mirroring is deployed in a POD in hop-by-hop mode to restore the flow forwarding path, implementing hop-by-hop fault demarcation.
- xFlow on-demand flow mirroring is deployed on specified leaf nodes to mirror full flows on demand.
By using mirroring locations and policies, the xFlow solution can bring the following benefits:
- Easy operation: full coverage of application connectivity and poor-QoE faults, implementing one-click fault diagnosis
- Quick troubleshooting: correlation analysis of applications and networks, implementing full-path fault demarcation in minutes
- Comprehensive evidence: network path state and original packets provided as evidence, ensuring data reliability and backtracking
Key Application Assurance
- Real-time monitoring of devices, networks, and applications, and quality profile analysis
- Application and network collaboration and quick fault demarcation, handling risks in advance
- Global overview: global monitoring of assured services on devices, networks, and applications
- Full-domain analysis: service path, network path, and device collaboration, implementing one-click fault demarcation
- Early intervention: early detection and handling of potential risks on paths, ensuring service experience
Integrated Service and Network O&M
In traditional solutions, it is difficult to determine whether a bank transaction failure is caused by an application-side fault or a network-side fault.
- One-stop: SSO and one-stop UI design for drill-down operations
- Integrated: insights into integrated O&M of services, transactions, and networks
- One-click: one-click diagnosis of application faults, proving the network innocence
- Author: Yang Xuechen
- Updated on: 2023-10-28
- Views: 3248
- Average rating: