Vision

Friday, November 13, 2009

Adding muscle to CHIPs

It's been a long time that I have written a post. Cavium Networks have acquired Monta Vista softwares. Looks this they are planning to become a more "Complete" company by adding muscle to the software side. Traditionally they have been a superb CHIP company with probably the best ASIC team in the industry. They are now strengthening the software arm too. Good for them and the investors !

Tuesday, October 7, 2008

Networking Systems Require Tight Control/Data Plane Integration

A critical element in the design of next-generation networking equipment is integrating the service intelligence provided by silicon with the transport capabilities of optics. An efficient combination of these two functional entities yields essential mechanisms needed to leverage the massive transport capacity available in the network for deploying and delivering new classes of service. The carrier, service provider, enterprise, and metro markets demand a new breed of equipment combining speed, capacity and intelligence to utilize the bandwidth in flexible and profitable ways, while incorporating emerging technologies such as MPLS, traffic engineering, VPN, and differentiated services over Ethernet and Sonet networks. Designing efficient, high-performance networking systems requires careful integration of data path network processors (NPs) with control plane processors, co-processors, fabrics, and application software. Handling a high-speed packet/cell stream requires a multi-chip process flow and unless all of the piece parts work together efficiently, the performance of the system will be less than optimal. In essence, choosing a 10-Gbps NP does not guarantee10-Gbps line-card performance unless the designer makes careful system level integration choices as opposed to component level integration choices.

Data Plane Requirements : The data plane can be viewed as consisting of two different logic blocks for ingress versus egress processing. In most of the architectures, egress processing is the simpler of the two since your fabric subsystem supplies relatively homogenous packets and egress processing mainly consists of traffic management functions. Ingress processing is the difficult task, since the incoming traffic can be a non-homogenous set of packet types, lengths and protocols from a variety of sources. All of those packets/cells must then go through parsing, classification, policing, admission control, and editing/modification at wire speed with bounded latency to sustain application requirements, especially performance.
When considering data plane processors, it is important to understand the demands placed by system applications on the ingress and egress logic of the data plane and also the interaction and integration with the control plane. The key attributes to evaluate are:

1. Classification performance: Search key size, search frequency and latency, associated data size, and the ability to do recursive search operations directly determine the data plane processor's ability to implement advanced admission control at line rate in edge route applications. Parsing flexibility is another feature affecting the breadth of protocols supported by a data plane processor and also impacts classification table size and maintenance complexity.

2. Provisioning performance: The key provisioning attributes of a data plane processor are standards compliance, range of policing rates, policing rate granularity, accuracy, and number of policing operations per packet while sustaining line rate. Policing algorithm support includes Diffserv srTCM, trTCM, and ATM GCRA and F-GCRA. Policing rates range from 8 kbps for VoIP to 10 Gbps for 10 Gigabit Ethernet links. Multiple policing operations per packet allow packets to be marked based on individual and aggregate flow provisioning.

3. Forwarding performance: A data plane processor's forwarding engine must be capable of making all packet modifications necessary at line rate. A key consideration is whether there is sufficient data path speed up to accommodate packet expansion caused by label stacking, tunneling and appending route headers needed by traffic managers and switch fabrics. A forwarding engine should be capable of pre-pending route headers, push/pop/swapping label stacks for MPLS and VLAN, and add or remove tunnels at line rate. Included in this effort are the QoS mappings necessary between protocols (IP, MPLS, Diffserv, 802.1Q, etc) contained within the packet and the route header formation.

4. Control plane integration: A tight coupling of the data plane and control plane processors provides several advantages. The results of the data plane processor's classification engine may be used to prioritize and expedite processing in the control processor. The data processor's policing engines may be used to protect the control plane from denial of service attacks. Most importantly, it allows the control processor to serve as an application processor for those packets requiring processing beyond layer 4.

Control Plane Requirements : Control plane processors are available or under development from Broadcom (Si-Byte), Sandcraft, PMC (QED), IBM and others. These processors are enhanced versions of popular embedded processors such as MIPS and PowerPC, with additional I/O and packet processing functionality. The key advantage they provide is in the area of development tools and ease of leveraging an existing code base. When considering a control plane processor the key attributes to evaluate include:

1. CPU performance and architecture: Data plane processors use customized or application driven instruction sets, whereas control plane processors use standard RISC architectures, so CPU MIPS are a key factor in control plane performance. MIPS, PowerPC, and ARM RISC architectures are suitable for the control plane, so the decision should be based on development environment, familiarity, or other control plane processor features. Control plane processors typically use multiple RISC cores, either loosely or tightly coupled, and tend to have one or two level caches.

2. Memory subsystem performance: The memory bandwidth of a control plane processor must meet the demands of both the RISC processor and communications I/O ports. An efficient implementation will allow the data plane processor to stream data directly into control plane processor shared memory without stalling the RISC cores from protocol processing. Since control traffic is stored and then forwarded, the shared memory bandwidth required will be four to eight times the average control traffic rate. The peak control traffic bandwidth equals the line rate, or 10 Gbps plus route header overhead, for 10 Gigabit Ethernet.

3. Communications I/O: Ideally, the data plane and control plane processors will have glueless interfaces that will sustain line peak rate and control traffic required rate with a bounded amount of latency. Several co-processor and data plane interfaces have been standardized or are under consideration by the Optical Internetworking Forum (OIF) and Network Processing Forum (NPF). For 10-Gbps data plane rates, some options include POS/Utopia-3, SPI 4, HyperTransport, or 3.125 GHz serializers/deserializers (serdes).

Integrating the Planes

A typical 10-Gbps line card contains network processing and traffic management functions between the optics interface and switch fabric. At 10 Gbps, the network processing function is divided between a data plane processor and control plane processor. This assures deterministic line rate performance independent of traffic mix and advanced routing requirements. The data plane and control plane processors should mate such that data traffic (packets or cells) is bi-directional over a glueless interface Figure 1. The SPI 4 interface is suitable since it meets 10-Gbps performance requirements and is widely available. Since the control plane and data plane share the packet processing task, designers should implement a glueless interface between these processors. SPI 4 might be an attractive option in these situations.
Control plane traffic is typically queued in multiple priority queues since it is not processed "on the fly" like data plane traffic. The subport capability of SPI 4 provides the mechanism for transferring multi-priority traffic between the data plane and control plane processors without head of line blocking.
Tight integration of the data plane and control plane processor architectures will enable several performance efficiencies. The control processor may leverage the classification and provisioning completed by the data plane processor. High priority control packets may be directly transferred to a control processor's RISC cache and the RISC CPU may be dispatched based on classification results.
Per flow statistics gathering is increasing in importance as quality of service is added to IP and MPLS networks through differentiated services. These statistics are not only used for billing, but are a key enabler for network engineering and security against denial of service attacks.
Per flow statistics are now a significant control plane bandwidth issue that is solved by tightly integrating the data plane and control plane processors. Statistics information is collected in the data plane and processed in the control plane.
Tight integration of the data and control planes also simplifies end product field support by enabling sampling and full statistics collection on selected flows. The data plane processor samples traffic by copying packets or portions of packets to the control plane processor for analysis. Random sampling may be applied globally across all traffic to support network engineering. Full sampling of selected flows enable problem determination on troublesome network links.
Linking the Software Up to this point, we've looked solely at the hardware requirements needed to marry the control and data plane. However, it's equally important to closely link the software environments used to develop code for these processors. Here's why.
To solve performance bottlenecks, network processors implement complex architectures and memory subsystems that consist of multi-processing, multi-threading, crossbar interconnects, hardware assist and micro-engines, and more. This hardware complexity, however, makes implementation a difficult task.
The architecture complexity problems were compounded by weak toolsets. In the early NP designs, each vendor had to develop a set of software development tools consisting of assemblers, compilers, debuggers and simulators. However, especially on the compiler front, most of these tools were basic and thus designers still had to do a ton of hand-tuned assembly code to make these processors work.
Hand tuning is a tedious and iterative process, since performance feedback comes only after software/hardware integration in the lab and requires intimate knowledge of the hardware micro-architecture. Thus, software development teams are forced to spend the majority of their time trying to fit their application to the NPU architecture, rather than focusing on the characteristics of the application itself.
What is required is a set of powerful software tools that allow the developer to define the data plane behavior of the system utilizing an abstracted graphical programming interface tightly and seamlessly integrated with a C development environment for the control plane.
By utilizing an abstracted graphical programming interface with integrated low-level code generation, the developer can focus on developing the rules required for their application, while the tools takes care of generating the optimized executable code including the C functions required for control plane integration. These tools should also include real-time performance analysis so the developer knows immediately how a particular function will perform, removing the long traditional performance tuning cycle from the development process.
Wrap UpDesigning and implementing systems that can deliver advanced IP/MPLS services over high bandwidth networks requires a great deal of effort to be spent on integration issues. These issues are related to the control plane, data plane, switch fabric, and the efficient mapping of applications between the control plane and the NP-based data plane, such that high performance levels can be sustained under varying traffic loads and patterns.
In choosing an NP, designers should equally weigh the feature set of the device as well as its impact on the overall system, this will ensure that they derive the full performance benefits from the new generation of easy to use application driven network processors.

Friday, August 1, 2008

Possible research projects on Octeon.

More popular of the Internet, more security issues are concerned for the Internet connections. Nowadays, VPN is becoming a very popular method to secure connections on the Internet. By a VPN connection, both terminals can exchange data in a secured tunnel which keeps data integrity and confidentiality. However, establishing a VPN connection costs much CPU process time and many hardware resources. Part of CPU and memory are occupied by encryption and decryption process. Therefore, a dedicate processor to execute decryption/encryption will save much CPU process time and many other hardware resources.Cavium Networks present a series of hardware based on MIPS processor, called Octeon, which provides coprocessor to process decryption/encryption work for faster execution.An open source VPN software Openswan on Octeon can be modified to replace its VPN decryption/encryption functions by hardware accelerator. The accuracy and performance of encryption and decryption processes are validated by comparing hardware and software solutions.

In this century, most data are stored in computers. With the increasing of data, the frequency of using computer to process data is growing much more than before. Therefore, decreasing data capacity is becoming one of the most important issues to reduce cost.During these years, SATA disk is the most cost effective solution to provide storage capacity. However, Cavium Octeon is an embedded system, which is no SATA disk installed. An analysis of Linux Kernel for installing SATA disk is presented for better storage capacity before data compression.There are two ways for data compression, loss compression and lossless compression. Loss compression is usually utilized in image, video and audio processing, as well as lossless compression is usually used for text compression or the environment of low fault-tolerant compression. Cavium Octeon provides a hardware lossless compression solution called zip Coprocessor which is implemented in this research.After the implementation, data compressed by zip Coprocessor can be decompressed by gzip, and the result is same vice versa.

Sunday, July 27, 2008

Network Based Application Recognition

Help Ensure Performance for Mission-Critical Applications: NBAR allows the network to provide differentiated services to each application. You can provide absolute priority and guaranteed bandwidth to your mission-critical applications such as Oracle or an application that runs on a particular Web page. At the same time you can limit the bandwidth consumed by the less essential applications. The end result is that users can access their mission-critical applications with minimal delay without the need to upgrade costly WAN links or cutting off access to commonly used, but not mission-critical, applications.

– Reduce WAN Expenses: In many parts of the world, and especially between countries, telecommunications links can still be prohibitively expensive. This leads to a dilemma for the network manager: on the one hand you need to provide access to new client-server and Internet-enabled applications, while on the other hand you need to control WAN service costs. NBAR provides a solution to this problem by enabling you to intelligently utilize WAN bandwidth so that you can provide acceptable service levels with the minimum possible bandwidth.

– Manage Web Response: The Web is now a critical business resource in many enterprises, for both internal and external communications. Employees, partners, and customers must have access to the Web pages they need without such problems as slow downloads or Web-based application failure. NBAR allows you to identify the Web pages and type of Web content that you deem critical.

– Improve VPN Performance: VPNs often reduce networking costs while providing increased flexibility. Unfortunately, the service quality in a VPN is often difficult to guarantee. Running NBAR and VPN concurrently in the same router solves this problem by identifying mission-critical traffic before it is encrypted, allowing the network to apply the appropriate QoS controls. By running both VPN and NBAR concurrently, we help ensure that the packets are processed in the correct order to achieve both maximum security and the appropriate QoS. NBAR can also mark the tunnel packet so that the service provider can provide differentiated service to different applications on the service provider's WAN.

– Improve Multiservice Performance: Multiservice networks allow you to combine your data, voice, and video requirements into one unified network. Unfortunately, each of these services requires different network characteristics. NBAR is able to intelligently identify the type of each packet and provide the proper network characteristics.

Thursday, July 24, 2008

Network-Based Entitlement Control or NBEC

From Network World

For all the lovely talk about access control emanating from so-called NAC vendors who must have invoked Merlin to magically transform the unworkable Network Admission Control into Network Access Control, there is still one huge problem with access controls. Most enterprises really have no idea who should have access to what resources. The granularity of access control needed to secure the enterprise is beyond the ken of most IT guys. Let’s face it, knowing what applications, networks, and data sets any one of say 10,000 people should have access to is not a simple problem.
Camelot attempted to address the failings of most identity and access management (IAM) systems by building in a learning component. What happened to Camelot? I wish I knew. For some reason the IT press is great at recording the history of startups as long as they have an active PR program. As soon as vendors start to die the historical record seems to get wiped clean. I would guess that part of the problem was that they were too far ahead of their time. Another issue was they relied on host agents to do the learning and enforcement, a company killer if there ever was one.
Now, in what appears to me to be the second coming, a new vendor is born from the knights of Cisco. Five top networking guys have apparently recognized that the marketing department at Cisco is not really that good at inventing security solutions (admission control) but that there truly is a need for automated tools to discover and enforce access control policies in the enterprise. The company, Rohati, came out of stealth mode in time for the Gartner IT Security Summit last week in DC. They are calling their technology Network-Based Entitlement Control or NBEC. No agents, automated discovery, policy management. I love it. This could work.
I hope the ever flexible NAC vendors get out of the end point health check business. Then we could have an industry that is all pulling in the same direction: towards better policy management, more granular authorization, and ultimately, better security.

Now Rohati Chooses Octeon.

Rohati Architecture uses OCTEON CN58XX for Multiple Functions of Control, Data, Security and Services to deliver up to 40 Gbps L4-L7 Secure Application Performance

MOUNTAIN VIEW, Calif., July 21, 2008 – Cavium Networks (NASDAQ: CAVM), a leading provider of semiconductor products that enable intelligent processing for networking, communications, storage and wireless applications, today announced that Rohati Systems, a leader in high-performance Network-Based Entitlement Control (NBEC) has utilized multiple Cavium OCTEON Plus MIPS64® CN58XX 4-core to 16-core processors in a highly innovative system architecture as part of its TNS™ Platform to deliver industry-leading performance and features in a cost-effective manner. Cavium Networks' processors are being designed into market-leading networking equipment such as routers, switches, Unified Threat Management appliances, Layer 4+ content-aware switches, modular chassis switches, wireless infrastructure equipment, broadband router and wireless LAN access/aggregation points.
Enterprise Security requirements are rapidly evolving in response to an increasingly dynamic and regulatory governed business climate. Definition and enforcement of these security policies has traditionally been done on a per-application level and through software-only solutions, which carry significant administrative costs and are subject to performance or granularity limitations. The Rohati TNS™ product line delivers for the first time a standards-based, high-performance network-based platform which transparently secures access to data-center resources across all users and applications without requiring client or server side agents thereby dramatically accelerating and simplifying deployment and lowering cost of ownership.
Rohati’s innovative system architecture uses multiple Cavium OCTEON CN58XX 4-core and 16-core processors for different purposes including control-plane, data-plane, security and services acceleration. The system consists of OCTEON processors as the only programmable components connected with a low-latency fabric in appliance and modular-chassis form-factors. These systems deliver a scalable family of networking systems with leading performance, granularity and security for network-based entitlement control at layer 4 to layer 7 performance of up to 40Gbps with 6 Million traffic flows. Rohati’s network-based entitlement control can be transparently deployed in the data center and applied across a broad range of applications and resources including Collaborative application such as Wikis and Microsoft SharePoint, unstructured data store such as CIFS file shares, packaged applications and legacy applications, in companies of all sizes.

Thursday, July 17, 2008

Cavium Networks to Acquire Taiwan-Based Star Semiconductor

MOUNTAIN VIEW, Calif., July 16, 2008 – Cavium Networks (NASDAQ: CAVM), a leading provider of semiconductor products that enable intelligent processing for networking, communications, security and wireless applications, today announced that it is has signed a definitive agreement to acquire certain assets and business of Star Semiconductor Corporation. Star Semiconductor is a Taiwan-based design house in Hsinchu with expertise in building highly integrated ARM-based SOC processors for the broadband, connected home and SOHO market segments. This acquisition will provide Cavium Networks with a highly experienced stand-alone SOC processor team based in Taiwan. The net purchase price of the acquisition will be approximately $9 million in cash. The acquisition is expected to close in the third calendar quarter of 2008.
Cavium's existing OCTEON single- and dual-core processor lines address gateway applications in the broadband market including SOHO/SMB, FTTH and enterprise 802.11n access point applications. This acquisition will enable Cavium to deliver highly optimized, cost effective and low power SOC processors to address a significantly broader range of network connected, triple-play enabled devices for the digitally connected home and office. Cavium Networks plans to continue to ship and sell Star's existing product lines.
"Cavium Networks' technology is enabling intelligent networks around the globe,” said Syed Ali, CEO and President of Cavium Networks, ”adding Star Semiconductor's highly experienced SOC team focused on broadband and network connected devices will enable us to significantly expand our served end markets. We are very excited about the addition of Star Semiconductor to the Cavium family.”
"Star Semiconductor has assembled a proven, highly experienced team of hardware, software and board-designers in Taiwan,” said Steven Huang, Chairman and CEO of Star Semiconductor, ”being based in Taiwan, we have intimate knowledge of application requirements in the broadband and network connected device markets. Working with local customers, we have developed significant core IP for these markets. Future products will combine Cavium and Star's IP to build highly-differentiated, low-power solutions for Cavium’s target markets. We look forward to leveraging Cavium's customer relationships and global sales to proliferate the use of the Star technology world-wide."