Search
Home Search Center IP Encyclopedia Online Courses Intelligent Model Selection

What Is DPCF?

DPCF is a second-level fault recovery technology that provides fast fault awareness and convergence based on the data plane for service session exceptions caused by silent faults, such as link faults, forwarding entry exceptions, and forwarding component exceptions, improving network-level reliability.

Why Do We Need DPCF?

In most cases, service session exceptions caused by silent faults cannot be quickly identified. Such exceptions include link faults, forwarding entry exceptions, forwarding component exceptions, traffic forwarding failures on a port in up state, and configuration errors. Typically, an analyzer is first used to collect traffic and detect anomalies. Then engineers manually check and rectify each fault. Troubleshooting takes up to several hours, severely affecting services. To address this issue, Huawei launches the DPCF technology that does not require an analyzer or manual intervention. It automatically detects network faults, including faults that cannot typically be automatically detected, such as routing blackholes and ARP entry issues. Then, it switches paths based on the preset policy to rectify faults within seconds. In key scenarios such as finance, storage, and supercomputing, services can be recovered within seconds, which is thousands of times more efficient than the industry average.

DPCF vs. DPFR

Both DPCF and Data Plane Fast Recovery (DPFR) are fault recovery technologies implemented on the data plane, independent of the control plane. They mainly differ in the following aspects:

  • DPFR detects faults based on ports, whereas DPCF detects faults by matching specific service flows using ACL rules.
  • DPFR can only detect hardware faults, for example, a faulty optical module or an incorrectly connected transmission optical cable. DPCF can identify network flows with forwarding exceptions caused by various factors, including link faults, forwarding entry exceptions, forwarding component exceptions, physical port suspension, and configuration errors. Such faults are not limited to hardware faults.
  • DPFR supports only a single path switchover, whereas DPCF supports up to three path switchovers within 15 minutes.
  • DPFR achieves fault convergence within sub-milliseconds, DPCF within seconds.

How Does DPCF Work?

DPCF provides network fault awareness and recovery based on the data plane.

Network Fault Awareness

In the following figure, during TCP traffic transmission, the sender sends data packets to the receiver. After receiving a data packet, the receiver returns an ACK packet to the sender, indicating that the data packet is successfully received. If no ACK packet is received within a specific period of time, the sender retransmits the data packets.

The sender creates a flow table for TCP traffic matching ACL rules. If the interval at which the sender retransmits the data packets exceeds the configured fault detection interval, it is determined to be a timeout fault. In this way, network fault awareness is achieved.

TCP acknowledgment mechanism
TCP acknowledgment mechanism

Network Fault Recovery

After identifying flows involved in faults on the network, DPCF performs hash-based path selection again and switches the path.

In the following figure, Leaf1 and Leaf3 function as the sender and receiver, respectively. Spine1 and Spine2 function as transmission devices. When a data packet is forwarded from Leaf1 to Leaf3, the normal forwarding path is Leaf1 -> Spine1 -> Leaf3. After identifying flows involved in faults, Leaf1 modifies the hash seed of the packet, and then forwards the packet with a new hash seed through Leaf1 -> Spine2 -> Leaf3. The network fault recovery is implemented as follows:

  1. Sender: Leaf1 matches the packet involved in faults using ACL rules, modifies the hash seed of the packet, and sets the path switching flag to 1. It then forwards the packet with a new hash seed to Spine2.
  2. Transmission device: Based on the new hash seed carried in the packet, Spine2 re-selects a path and forwards the packet to Leaf3.
  3. Receiver: After receiving the packet, Leaf3 restores the hash seed and path switching flag to 0 and forwards the packet to the server.
Network fault recovery
Network fault recovery

Typical Application of DPCF

In the traditional Layer 3 network architecture shown in the following figure, servers are connected using independent IP addresses. Leaf switches are deployed as independent Layer 3 gateways to forward Layer 2 and Layer 3 traffic. Spine switches are deployed as independent Layer 3 devices and are connected to leaf switches to implement ECMP load balancing. This networking mode mainly applies to lossless scenarios such as finance, storage, and supercomputing. If a silent fault occurs on the network, services are interrupted for a long time and upper-layer services are severely affected. For online transaction applications, if packets are continuously lost, the transaction fails or in some cases the connection to the peer protocol stack times out entirely. As a result, the application performance deteriorates significantly. After DPCF is deployed and an exception occurs during service flow forwarding, the device can automatically detect the fault and quickly switch services to other ECMP members, ensuring second-level service recovery.

Traditional Layer 3 network architecture
Traditional Layer 3 network architecture
About This Topic
  • Author: Yang Xiaoli
  • Updated on: 2025-01-07
  • Views: 1202
  • Average rating:
Share link to