What Is Warm Reboot?
Warm reboot ensures network service continuity when a device restarts. It ensures that traffic interruptions are less than 10s so that AI training tasks are not interrupted. Currently, warm reboot is used only in AI training scenarios.
Why Do We Need Warm Reboot?
Most network faults in the industry are caused by software bugs, which may trigger device restarts, causing service interruption.
If a device software bug occurs, the traditional reboot solution may cause frequent training interruptions and training rollback to the status at the backup time point, wasting training results.
With warm reboot, the device restarts in asynchronous mode. In this way, training in AI training scenarios is not interrupted by a device software exception, and recovers within the recovery time objective (RTO), improving device reliability.
What Are the Differences Between Warm Reboot and Traditional Reboot?
Warm reboot can be triggered in either of the following modes:
- Proactive reboot: When restarting a device, you can run the corresponding command to trigger the warm reboot function.
- Passive reboot: When a fault occurs and the warm reboot conditions are met, the device automatically triggers the warm reboot function and restarts.
The following table lists the differences between warm reboot and traditional reboot.
Trigger Mode |
Traditional Reboot |
Warm Reboot |
|
---|---|---|---|
Passive reboot |
Service interruptions last for more than 120s if the device restarts. |
The device is restarted in asynchronous mode and service interruptions are less than 10s. Service training is uninterrupted, and 90%+ of problem scenarios are covered. |
|
Proactive reboot |
Service interruptions last for more than 120s if the device restarts. |
The device is restarted in asynchronous mode and service interruptions are less than 10s. Service training is uninterrupted, minor exceptions are quickly rectified, and proactive maintenance is implemented. |
How Does Warm Reboot Work?
The working mode varies according to the warm reboot trigger mode.
- Proactive reboot
Proactive reboot requires pre-check and preprocessing before the restart.
The pre-check is to check whether the system allows warm reboot.
The preprocessing is to enable the system to perform necessary operations before the warm reboot.
After the pre-check and preprocessing are successful, the device restarts in warm reboot mode.
- Passive reboot
The system determines the cause of the fault and triggers a warm reboot if the warm reboot requirements are met.
How Do I Select a Reboot Mode?
If a serious fault occurs in the system, for example, a hardware fault occurs, traditional reboot may be the only choice.
If a minor problem occurs in the system, for example, some services are abnormal, you can try warm reboot to solve the problem. In addition, the warm reboot mode is recommended when the system interruption time needs to be minimized because the device can be quickly restarted in this mode.
- Author: Cong Ying
- Updated on: 2024-11-22
- Views: 362
- Average rating: