Tuesday, October 7, 2008

Networking Systems Require Tight Control/Data Plane Integration

A critical element in the design of next-generation networking equipment is integrating the service intelligence provided by silicon with the transport capabilities of optics. An efficient combination of these two functional entities yields essential mechanisms needed to leverage the massive transport capacity available in the network for deploying and delivering new classes of service. The carrier, service provider, enterprise, and metro markets demand a new breed of equipment combining speed, capacity and intelligence to utilize the bandwidth in flexible and profitable ways, while incorporating emerging technologies such as MPLS, traffic engineering, VPN, and differentiated services over Ethernet and Sonet networks. Designing efficient, high-performance networking systems requires careful integration of data path network processors (NPs) with control plane processors, co-processors, fabrics, and application software. Handling a high-speed packet/cell stream requires a multi-chip process flow and unless all of the piece parts work together efficiently, the performance of the system will be less than optimal. In essence, choosing a 10-Gbps NP does not guarantee10-Gbps line-card performance unless the designer makes careful system level integration choices as opposed to component level integration choices.

Data Plane Requirements : The data plane can be viewed as consisting of two different logic blocks for ingress versus egress processing. In most of the architectures, egress processing is the simpler of the two since your fabric subsystem supplies relatively homogenous packets and egress processing mainly consists of traffic management functions. Ingress processing is the difficult task, since the incoming traffic can be a non-homogenous set of packet types, lengths and protocols from a variety of sources. All of those packets/cells must then go through parsing, classification, policing, admission control, and editing/modification at wire speed with bounded latency to sustain application requirements, especially performance.
When considering data plane processors, it is important to understand the demands placed by system applications on the ingress and egress logic of the data plane and also the interaction and integration with the control plane. The key attributes to evaluate are:

1. Classification performance: Search key size, search frequency and latency, associated data size, and the ability to do recursive search operations directly determine the data plane processor's ability to implement advanced admission control at line rate in edge route applications. Parsing flexibility is another feature affecting the breadth of protocols supported by a data plane processor and also impacts classification table size and maintenance complexity.

2. Provisioning performance: The key provisioning attributes of a data plane processor are standards compliance, range of policing rates, policing rate granularity, accuracy, and number of policing operations per packet while sustaining line rate. Policing algorithm support includes Diffserv srTCM, trTCM, and ATM GCRA and F-GCRA. Policing rates range from 8 kbps for VoIP to 10 Gbps for 10 Gigabit Ethernet links. Multiple policing operations per packet allow packets to be marked based on individual and aggregate flow provisioning.

3. Forwarding performance: A data plane processor's forwarding engine must be capable of making all packet modifications necessary at line rate. A key consideration is whether there is sufficient data path speed up to accommodate packet expansion caused by label stacking, tunneling and appending route headers needed by traffic managers and switch fabrics. A forwarding engine should be capable of pre-pending route headers, push/pop/swapping label stacks for MPLS and VLAN, and add or remove tunnels at line rate. Included in this effort are the QoS mappings necessary between protocols (IP, MPLS, Diffserv, 802.1Q, etc) contained within the packet and the route header formation.

4. Control plane integration: A tight coupling of the data plane and control plane processors provides several advantages. The results of the data plane processor's classification engine may be used to prioritize and expedite processing in the control processor. The data processor's policing engines may be used to protect the control plane from denial of service attacks. Most importantly, it allows the control processor to serve as an application processor for those packets requiring processing beyond layer 4.

Control Plane Requirements : Control plane processors are available or under development from Broadcom (Si-Byte), Sandcraft, PMC (QED), IBM and others. These processors are enhanced versions of popular embedded processors such as MIPS and PowerPC, with additional I/O and packet processing functionality. The key advantage they provide is in the area of development tools and ease of leveraging an existing code base. When considering a control plane processor the key attributes to evaluate include:

1. CPU performance and architecture: Data plane processors use customized or application driven instruction sets, whereas control plane processors use standard RISC architectures, so CPU MIPS are a key factor in control plane performance. MIPS, PowerPC, and ARM RISC architectures are suitable for the control plane, so the decision should be based on development environment, familiarity, or other control plane processor features. Control plane processors typically use multiple RISC cores, either loosely or tightly coupled, and tend to have one or two level caches.

2. Memory subsystem performance: The memory bandwidth of a control plane processor must meet the demands of both the RISC processor and communications I/O ports. An efficient implementation will allow the data plane processor to stream data directly into control plane processor shared memory without stalling the RISC cores from protocol processing. Since control traffic is stored and then forwarded, the shared memory bandwidth required will be four to eight times the average control traffic rate. The peak control traffic bandwidth equals the line rate, or 10 Gbps plus route header overhead, for 10 Gigabit Ethernet.

3. Communications I/O: Ideally, the data plane and control plane processors will have glueless interfaces that will sustain line peak rate and control traffic required rate with a bounded amount of latency. Several co-processor and data plane interfaces have been standardized or are under consideration by the Optical Internetworking Forum (OIF) and Network Processing Forum (NPF). For 10-Gbps data plane rates, some options include POS/Utopia-3, SPI 4, HyperTransport, or 3.125 GHz serializers/deserializers (serdes).

Integrating the Planes
A typical 10-Gbps line card contains network processing and traffic management functions between the optics interface and switch fabric. At 10 Gbps, the network processing function is divided between a data plane processor and control plane processor. This assures deterministic line rate performance independent of traffic mix and advanced routing requirements. The data plane and control plane processors should mate such that data traffic (packets or cells) is bi-directional over a glueless interface Figure 1. The SPI 4 interface is suitable since it meets 10-Gbps performance requirements and is widely available. Since the control plane and data plane share the packet processing task, designers should implement a glueless interface between these processors. SPI 4 might be an attractive option in these situations.
Control plane traffic is typically queued in multiple priority queues since it is not processed "on the fly" like data plane traffic. The subport capability of SPI 4 provides the mechanism for transferring multi-priority traffic between the data plane and control plane processors without head of line blocking.
Tight integration of the data plane and control plane processor architectures will enable several performance efficiencies. The control processor may leverage the classification and provisioning completed by the data plane processor. High priority control packets may be directly transferred to a control processor's RISC cache and the RISC CPU may be dispatched based on classification results.
Per flow statistics gathering is increasing in importance as quality of service is added to IP and MPLS networks through differentiated services. These statistics are not only used for billing, but are a key enabler for network engineering and security against denial of service attacks.
Per flow statistics are now a significant control plane bandwidth issue that is solved by tightly integrating the data plane and control plane processors. Statistics information is collected in the data plane and processed in the control plane.
Tight integration of the data and control planes also simplifies end product field support by enabling sampling and full statistics collection on selected flows. The data plane processor samples traffic by copying packets or portions of packets to the control plane processor for analysis. Random sampling may be applied globally across all traffic to support network engineering. Full sampling of selected flows enable problem determination on troublesome network links.
Linking the Software Up to this point, we've looked solely at the hardware requirements needed to marry the control and data plane. However, it's equally important to closely link the software environments used to develop code for these processors. Here's why.
To solve performance bottlenecks, network processors implement complex architectures and memory subsystems that consist of multi-processing, multi-threading, crossbar interconnects, hardware assist and micro-engines, and more. This hardware complexity, however, makes implementation a difficult task.
The architecture complexity problems were compounded by weak toolsets. In the early NP designs, each vendor had to develop a set of software development tools consisting of assemblers, compilers, debuggers and simulators. However, especially on the compiler front, most of these tools were basic and thus designers still had to do a ton of hand-tuned assembly code to make these processors work.
Hand tuning is a tedious and iterative process, since performance feedback comes only after software/hardware integration in the lab and requires intimate knowledge of the hardware micro-architecture. Thus, software development teams are forced to spend the majority of their time trying to fit their application to the NPU architecture, rather than focusing on the characteristics of the application itself.
What is required is a set of powerful software tools that allow the developer to define the data plane behavior of the system utilizing an abstracted graphical programming interface tightly and seamlessly integrated with a C development environment for the control plane.
By utilizing an abstracted graphical programming interface with integrated low-level code generation, the developer can focus on developing the rules required for their application, while the tools takes care of generating the optimized executable code including the C functions required for control plane integration. These tools should also include real-time performance analysis so the developer knows immediately how a particular function will perform, removing the long traditional performance tuning cycle from the development process.
Wrap UpDesigning and implementing systems that can deliver advanced IP/MPLS services over high bandwidth networks requires a great deal of effort to be spent on integration issues. These issues are related to the control plane, data plane, switch fabric, and the efficient mapping of applications between the control plane and the NP-based data plane, such that high performance levels can be sustained under varying traffic loads and patterns.
In choosing an NP, designers should equally weigh the feature set of the device as well as its impact on the overall system, this will ensure that they derive the full performance benefits from the new generation of easy to use application driven network processors.

No comments: