IP Encyclopedia > CloudFabric

What Is CloudFabric?

Huawei's CloudFabric solution is a software-defined networking (SDN)-based DCN solution. It draws on the flagship core switches CloudEngine 16800/12800, high-performance fixed switches CloudEngine 9800/8800/7800/6800/5800, the DCN controller — iMaster NCE-Fabric, the intelligent network analysis platform — iMaster NCE-FabricInsight, and the security solution — HiSec, thereby implementing full-lifecycle simplified network O&M, spanning network planning and construction, service provisioning, O&M monitoring, and change optimization. Not only this, this solution can intelligently discover, analyze, and isolate faults, and converge the computing network and the storage network over an Ethernet network, achieving zero packet loss and improving computing and storage performance.

Contents

Why Do We Need CloudFabric?

DCNs are the key infrastructure of ICT. During digital transformation, a large number of new industries and ICT technologies are emerging, posing new requirements on traditional DCs.

Services go online quickly, requiring network pooling and automation.
There are various types of services and policies. Therefore the service deployment workload of traditional DCs is heavy, and the efficiency is low.

Transformation in the IT field drives DCNs towards all-Ethernet.
- The IT architecture changes from centralized to distributed, and a large number of nodes are interconnected via Ethernets.
- Computing units provide Ethernet ports and work with Remote Direct Memory Access over Converged Ethernet (RoCE) to provide CPU/GPU interconnection over Ethernets.
- Storage media evolves from hard disk drives (HDDs) to all-flash storage. Non-Volatile Memory express (NVMe) is used within storage nodes and high-bandwidth RoCE is used for interconnection between storage nodes.
Transformation in the IT field driving DCNs towards all-Ethernet

DCs are evolving towards all-Ethernet. However, traditional Ethernet networks cannot meet the preceding requirements due to packet loss and large latency.

As applications are migrated dynamically and traffic increases sharply, DCs require intelligent O&M.
As DCs expand, service policies become more complex, and various virtualization and cloud technologies are used together, O&M becomes more difficult, which cannot be coped with using traditional O&M experience.

Against this backdrop, Huawei launches the CloudFabric hyper-converged DCN solution to:

Achieve full-lifecycle automation of network services and slash the Time to Market (TTM) by 90%.
Build a lossless Ethernet network to implement lossless HPC and implement lossless long-distance transmission so as to build intra-city active-active storage networks over Ethernets.
Implement fast fault detection, intelligent analysis, and fast fault remediation, as well as proactive fault prediction in a large number of fault scenarios.

What Are the Benefits of CloudFabric?

Full-Lifecycle Automation, Implementing Network as a Service and Second-level Service Provisioning

Currently, network configuration automation has been implemented through SDN on many DCNs. However, service design and planning, technical review, and effect verification still need to be manually performed, involving multiple departments and roles. The entire process is time-consuming and inefficient, which has become the bottleneck of service provisioning.

The CloudFabric solution introduces intelligent algorithms in:

Design phase: The factors that affect network design are broken down into three evaluation dimensions: resource, quality, and reliability. In this way, the network solution can be generated and recommended in seconds.
Verification phase: The network topology, device configuration, and traffic information are calculated together to implement second-level verification of massive configurations on the entire network.

With intelligent algorithms, CloudFabric can implement full-lifecycle automatic network management and control in the following four phases: DC planning and construction, service provisioning, O&M monitoring, and change optimization.

All-Ethernet DCN, Unleashing Computing Power and Improving Storage Performance

The CloudFabric solution provides an all-Ethernet HPC network for HPC scenarios. Based on Huawei's unique iLossless^TM algorithm, the solution solves the Ethernet packet loss problem lasting for more than 40 years and achieves zero packet loss under 100% throughput, providing the ultimate network performance required by HPC services with unchanged network scale and doubled computing power.

The CloudFabric solution provides an active-active all-Ethernet storage network for storage scenarios. Based on the iLossless^TM algorithm for short-distance transmission, the iLossless-DCI algorithm is proposed to solve the packet loss problem in long-distance transmission scenarios. Huawei's solution for replacing Fibre Channel (FC) with all-Ethernet has been successfully put into commercial use for many times, increasing the bandwidth from 32 Gbit/s to 400 Gbit/s and the input/output operations per second (IOPS) by 87%.

Network-Wide Intelligent O&M, Ensuring 24/7 Service Continuity

The CloudFabric solution uses the telemetry technology to collect multi-dimensional data from the network, and the intelligent analysis platform to analyze network-wide O&M data. In addition to visualization of various O&M data, the CloudFabric solution provides the following key O&M capabilities:

Network health evaluation: A multi-dimensional evaluation system in terms of the device, network, protocol, overlay, and service is built to integrate configuration data, entry data, log data, and KPI performance data on the network with the help of telemetry. The intelligent analysis platform can detect issues and risks in each dimension of the network in real time. The detection scope covers the network working status, network capacity, component sub-health, and service traffic exchange. In this way, O&M personnel can view the overall experience quality of the entire network.
Quick root cause locating: Based on knowledge graph, known DCN faults can be detected within 1 minute, located within 3 minutes, and rectified within 5 minutes. Unknown faults learning and fault inference are also supported to help O&M personnel deeply explore the root causes of unknown faults.
Automatic assurance for service changes: Network data after configuration changes are collected to perform modeling to check whether the actual network forwarding behavior is consistent with users' service intents. O&M personnel can use the verification result to check whether the change meets the expectation and causes issues. If an intent fails verification, they can locate the failure cause, greatly improving the O&M efficiency in network change scenarios. In addition, important services can be periodically and automatically verified to ensure normal and reliable running of the services.

What Are the Components of CloudFabric?

CloudFabric Solution Architecture

The following figure shows the architecture of Huawei's CloudFabric DCN solution, which consists of the server layer, fabric layer, resource management layer, and application layer.

Architecture of the CloudFabric DCN solution

Server Layer

This layer carries server resources including virtual machines (VMs), containers, and physical machines (PMs) of applications. Resources at this layer are not provided by the CloudFabric solution.

Table 1-1 Server layer description

Object	Description
VM	Physical server resources are abstracted as VMs and managed through computing virtualization technology to carry services.
Container	A container is an abstraction of the application layer that packages code and dependencies together. Multiple containers can run on the same host and share the operating system kernel with other containers. Each container runs as an independent process in the user space.
PM	A PM is a physical server.

Fabric Layer

This layer consists of network devices, such as switches, firewalls, and load balancers (LBs). It provides different network services for servers to communicate with each other in a DC and to access resources outside the DC.

Table 1-2 Fabric layer description

Object	Description
Switch	Huawei CloudEngine series switches construct a virtual extensible local area network (VXLAN) to receive and forward data and provide high-speed interconnection channels for the server layer. For details about CloudEngine series switches, see CloudEngine Series Switch Introduction.
Value-added service (VAS) device	Huawei firewalls and third-party firewalls provide value-added network services, such as Access Control List (ACL) and network address translation (NAT). For details about Huawei firewalls, see Firewall Introduction. LBs use the load balancing technology to evenly distribute network requests to multiple servers, reducing the load of a single server and improving service experience and reliability.

Object

Description

Switch

Huawei CloudEngine series switches construct a virtual extensible local area network (VXLAN) to receive and forward data and provide high-speed interconnection channels for the server layer. For details about CloudEngine series switches, see CloudEngine Series Switch Introduction.

Value-added service (VAS) device

Huawei firewalls and third-party firewalls provide value-added network services, such as Access Control List (ACL) and network address translation (NAT). For details about Huawei firewalls, see Firewall Introduction.
LBs use the load balancing technology to evenly distribute network requests to multiple servers, reducing the load of a single server and improving service experience and reliability.

Resource management layer

This layer abstracts storage, computing, and network resources in a DC and manages them in a unified manner.

Table 1-3 Resource management layer description

Object	Description
iMaster NCE-Fabric	Huawei's DC controller iMaster NCE-Fabric manages network devices in the southbound direction and interconnects with platform systems at the resource management layer in the northbound direction to implement automatic service deployment and network resource management throughout the lifecycle. For details about the controller, see iMaster NCE-Fabric Introduction Huawei firewalls are managed by the security controller SecoManager, which is a service on iMaster NCE-Fabric.
Multi-data center controller (MDC)	When a customer has multiple DCs, each DC is a resource domain, and an independent set of iMaster NCE-Fabric is deployed in each DC, the MDC can be used to uniformly orchestrate and manage network services of multiple DCs. For details about MDC, see MDC Introduction.
iMaster NCE-FabricInsight	Huawei's network intelligent analysis platform iMaster NCE-FabricInsight detects the fabric status and application behavior status in real time, detects network and application issues in a timely manner, performs health check, and analyzes the root cause of network faults. For details about iMaster NCE-FabricInsight, see iMaster NCE-FabricInsight Introduction.
HiSec Insight	Huawei's HiSec Insight, formerly known as cybersecurity intelligence system (CIS), is a big data-based product that defends against advanced persistent threats (APTs). It can detect potential and advanced threats on a network, implementing network-wide security situational awareness in enterprises. It can also be used in the Huawei HiSec solution for handling threats in a closed-loop manner. For details about HiSec Insight, see HiSec Insight Introduction.
Huawei CLOUD Stack	The HUAWEI CLOUD Stack solution uses FusionSphere OpenStack as the cloud platform to integrate resources of each physical DC and uses ManageOne as the DC management software to manage multiple DCs in a unified manner. For details about the solution, see HUAWEI CLOUD Stack Introduction. HUAWEI CLOUD Stack provides a wide range of cloud services, such as computing, storage, network, security, disaster recovery, and platform as a service (PaaS).
Computing virtualization management platform	This is a platform such as vSphere vCenter that virtualizes and manages compute resources.
Container platform	This is a platform such as open-source Kubernetes and Docker that implements container-based management of compute resources.
OpenStack	This is the cloud platform of the open-source community.

Application Layer

This layer contains applications for which the CloudFabric solution provides network services. These applications are managed by service departments. Common business to customer (B2C) services include game apps and video apps, and common business to business (B2B) services include Data Center Interconnect (DCI) private lines and virtual private cloud (VPC) services.

CloudFabric Model

To meet users' requirements for service networks, the CloudFabric solution needs to abstract service models and requires the basic physical network support. As shown in the following figure, the CloudFabric model consists of the physical model, logical model, and application model. The following table describes each model.

Physical, logical, and application models of CloudFabric

Table 1-4 CloudFabric model description

Model	Description	Example
Application model	Tenant: A DC administrator can create one or more tenants based on service requirements and specify network resource quotas for each tenant. Services of different tenants are isolated by default.	A tenant account can be allocated to a company or to each department or service type in a company.
Application model	VPC: A tenant administrator can create one or more VPCs based on service requirements. VPCs are isolated by default and occupy the resource quota of the tenant.	Different VPCs can be created for different departments or for different types of services.
Logical model	In each VPC, network parameters need to be configured based on service requirements. Common logical elements in a logical model are as follows: Logical router: corresponds to a VXLAN EVPN-based Layer 3 virtual private network (L3VPN). Logical switch: corresponds to a VXLAN EVPN-based Layer 2 virtual private network (L2VPN). One or more subnets can be defined on a logical switch. Logical port: is abstracted from a Layer 2 sub-interface that can be associated with a bridge domain (BD). End port: corresponds to a VM connected to a VPC. Logical service function (SF): corresponds to a logical device that provides VAS services, such as a logical firewall and a logical LB. External network: corresponds to a logical object connected to a network outside a DCN. It defines the connection mode and network to be connected to.	Logical router: Create an L3VPN named VRF0001 and generate the corresponding VNI. Logical switch: Create a subnet and a corresponding gateway and generate a corresponding BD. Logical port: Connect a Layer 2 sub-interface to a BD. End port: Associate a server interface with a logical port. Logical SF: A server wants to access an external network through firewalls using methods such as source network address translation (SNAT), elastic IP address (EIP), and IP Security (IPsec). External network: A server wants to access the Internet through a public IP address or access a remote private network.
Physical model	A fabric consists of a group of interconnected spine and leaf nodes as well as VAS devices. It is a physical network that allows multiple tenants to use.	On a controller, one or more fabric resource pools can be created for specific device resources, for example, Fabric1 and Fabric2.

What Are the Operation Scenarios of CloudFabric?

The service management process and mode vary depending on the operation scenario. The main operation scenarios of CloudFabric include cloud-network integration, network virtualization – computing, and network virtualization – hosting.

Cloud-Network Integration Scenario

The following figure shows the logical diagram of the cloud-network integration scenario in the CloudFabric solution. The cloud platform provides a unified storage, computing, and network management GUI and is connected to the network controller.

Cloud-network integration scenario

A service administrator creates storage, computing, and network resources on the cloud platform GUI:

Network resources are allocated to specified services or applications on the cloud platform. The cloud platform delivers service provisioning instructions to the network controller, which then automatically delivers the configuration to devices.
Computing and storage resources are created, deleted, and migrated on the cloud platform. The cloud platform, network controller, network devices, and servers interact with each other without manual intervention.

Network Virtualization – Computing Scenario

If a unified cloud platform cannot be built because the computing service management system is complex or computing and network management cannot be fully converged, the network virtualization – computing scenario is recommended.

The following figure shows the logical diagram of the computing scenario in the CloudFabric solution. The controller connects to the computing virtualization platform and does not connect to the cloud platform, and the controller and computing virtualization platform are responsible for service provisioning and implement collaborative provisioning of computing and network resources.

Computing scenario

Service provisioning consists of two parts:

Network service provisioning: A network administrator uses the controller to orchestrate network services. The controller then delivers network configurations to the Virtual Machine Manager (VMM) through interfaces.
Computing service provisioning: When a computing administrator creates, deletes, or migrates a VM on the VMM, the VMM notifies the controller of the VM going-online and going-offline information. The controller then delivers the configuration of the corresponding access port to complete end-to-end service configuration and rollout.

Network Virtualization – Hosting Scenario

In the hosting scenario, physical resources, including racks, equipment rooms, and hosts, are leased to users. Besides, value-added services such as security, load balancing, public IP addresses, and access bandwidth resources can also be leased. In most cases, carriers and some Internet service provider (ISPs) lease the resources and services.

The following figure shows the logical diagram of the hosting scenario in the CloudFabric solution.

Hosting scenario

In this scenario, the controller does not connect to the cloud platform or VMM. A network administrator directly manages network services in multiple central equipment rooms on the service orchestration page of the controller. A central equipment room can be connected to one or more edge equipment rooms to integrate and maximize the utilization of equipment room resources.

Scenario Comparison

The following table lists the comparison between the preceding scenarios.

Table 1-5 Scenario comparison

Scenario	Applicable Scope	Characteristics	Limitations
Cloud-network integration	This scenario is applicable when the network and IT department of an enterprise are integrated and well-equipped with technical strength.	Storage, computing, and network resources are managed by a unified platform, providing an end-to-end cloud-network synergy solution.	Users must be well-equipped with technical strength.
Network virtualization – computing	The internal network and IT departments of an enterprise are not integrated, but service deployment and management need to be automated to some extent.	The automation of network and computing deployment is high, and the network is not prone to errors.	There are requirements for the third-party VMM version. Network resources need to be provisioned before VM resources because network and computing are coupled.
Network virtualization – hosting	Integrated leasing of scattered Internet Data Center (IDC) rack resources: After years of operations, the rack resources in IDC equipment rooms are fragmented and cannot meet the requirements of large customers. Integrated leasing of rack resources in the public switching telecommunication network (PSTN) equipment rooms (metropolitan area network): After optical reconstruction, a large number of PSTN equipment rooms are idle. Through Layer 2 interconnection, multiple physical equipment rooms are integrated into one logical equipment room.	Simple and efficient network deployment, no dependency on third parties, and computing and network decoupling	Automated deployment is available only to the network. The collaboration with computing is performed in offline mode.

What Are the Application Scenarios of the All-Ethernet Intelligent and Lossless Network in the CloudFabric Solution?

With the evolution of the IT architecture, compute resources, and storage resources in DCs, DCNs evolve from the multi-protocol mode to the all-Ethernet mode.

Huawei's CloudFabric 3.0 hyper-converged DCN solution provides a lossless Ethernet network to meet DCN evolution requirements. It can be used in typical scenarios such as centralized storage, distributed storage, HPC, and artificial intelligence (AI) training. For details, see Hyper-Converged Data Center Network.

For details about the fundamentals and configuration commands of the intelligent and lossless features, see "Configuration > Configuration Guide > Intelligent Lossless Network Configuration" in Huawei CloudEngine Series Switches Product Documentation.

References

1(eBook) Hyper-Converged Data Center Network

2Huawei CloudFabric Data Center Network Solution

3CloudFabric Data Center Network Solution Documentation Bookshelf

4CloudFabric Data Center Network Solution Product Documentation

About This Topic

Author： Zhang Fan
Updated on： 2023-12-01
Views： 8152
Average rating：