NoC Interconnect IP Improves SoC Power, Performance and Area
- Charlie Janac, Monica Tang
- 10 min read
Imagine a design team defining the architecture of a new system-on-chip (SoC) device in terms of the Intellectual Property (IP) functional blocks needed for implementation. Many of these IPs will come from trusted third-party vendors, while one or more in-house-developed IPs will provide the secret sauce that differentiates this SoC from its competitors.
Next, the IP blocks need to be assembled to allow them to communicate on the die. Months of effort may go into rigorously selecting the IPs required to reflect the SoC’s desired functionality. However, without an appropriate on-chip interconnect fabric, these IPs remain a collection of isolated blocks. These blocks become a final SoC architecture only by the addition of communication pathways. It is the on-chip interconnect and its speed, latency, power consumption, communications features, topology and quality-of-service (QoS) that ultimately implement the architecture, structure and capabilities of the SoC.
Virtually all multiprocessor SoCs in production today employ network-on-chip (NoC) technology, as it provides the most efficient and scalable method for interconnecting IP blocks. NoCs streamline communication between functional blocks, enhance performance and simplify the process of scaling existing designs and creating derivative products.
Increasing SoC Capacity, Complexity and Performance
It is primarily the increasing capacity, complexity and performance of SoCs that has driven the adoption of NoC interconnects. These demands have advanced in multiple dimensions as follows:
- Advanced semiconductor manufacturing process nodes:
- SoC design starts have progressed from 22nm, while 14nm and 12nm nodes are popular for many applications. Current designs are being developed at 7nm and 5nm, with the 3nm node gaining traction for advanced SoC applications.
- Density has increased significantly, averaging 25x between the 22nm node and the 5nm node, reaching up to 300 million transistors per mm² for 3nm.
- Die sizes have generally remained stable or increased slightly in complex SoCs due to the integration of more functionality per node.
- Growing IP block count:
- Today, even small SoCs may have 100 IPs, while leading-edge designs can exceed 500 IPs, driving total transistor counts into the tens of billions.
- Power management complexity:
- Advanced techniques like dynamic voltage and frequency scaling (DVFS) and multiple power domains are now essential to balance performance, power and thermal requirements.
- Small SoCs have also grown in power management complexity to support various applications with multi-week or multi-month battery life.
- Higher interconnect speed:
- Interconnect IPs now operate in the multi-GHz range to support demanding applications.
- Multiple interconnected NoC pathways are often required to handle the bandwidth and performance needs of today’s SoCs.
Increases in SoC complexity have driven industry-wide adoption of NoC interconnect IP. This was due to the NoC’s inherent architectural advantages in power consumption, performance and area over traditional crossbar and bus-based interconnect technologies. These advantages enable SoC architects and designers to manage the growing sophistication of heterogeneous architectures while reducing both research and development (R&D) and per-unit costs.
Arteris NoC Technology
Arteris pioneered NoC interconnect IP in the mid-2000s and has considerable experience helping users improve the performance, power, and area (PPA) profiles of their SoCs. Over time, users have requested and received hundreds of features needed to effectively design a wide range of differentiated SoCs, from cost-sensitive to highly complex. To date, Arteris’ NoC interconnect IP has been deployed in more than 3.6 billion SoCs.
The goal of many of these features is to improve the PPA of the interconnect IP to achieve both R&D and unit cost savings. Additionally, these features boost SoC IP assembly productivity. A closer look at the PPA-related aspects of NoC interconnect IP reveals how these enhancements are achieved.
Power Consumption
NoC technology significantly reduces idle power consumption compared to traditional hybrid bus or crossbar interconnects, offering substantial improvements in power efficiency for modern SoC designs.
There are three types of power that determine battery life and heat dissipation: active power, clock tree power and leakage power.
Active power is consumed by CPUs, GPUs, multimedia subsystems, communications subsystems and other IP blocks while processing, transmitting and generating data traffic. The operating modes and data sets of specific use cases determine whether these units are operating at peak levels or reduced frequency and voltage levels.
Clock tree power is consumed by IP blocks whenever they are enabled and receive a clock signal, regardless of their activity level or whether they are actively processing data.
Interconnect IP makes a significant difference in reducing clock tree power consumption, which can dominate active power over extended periods. SoC power depends on more than just whether IPs are on or off. Different use cases activate various parts of the SoC, causing combinations of IP blocks to consume power in a range between full active power and clock tree power alone.
Leakage power is consumed by each transistor in a chip when it is powered on. Low voltage threshold (LVt ) logic cells operate at high speed but exhibit higher leakage compared to other cell types. To balance performance and power efficiency, many chip designs use a mix of cell types, reserving LVt cells only for those paths that have trouble meeting timing requirements.
One way NoC IP minimizes leakage power is by reducing the need for LVt cells in chip design. The NoC achieves this by optimizing the physical placement of pipeline stages within the die floorplan, avoiding excessive wire capacitance and long propagation delays that create troubled timing paths.
During periods of high data processing throughput in CPUs, GPUs, NPUs, and other processor and accelerator IPs, leakage power consumption can be minimized by running the IP at maximum speed to complete tasks quickly before powering down. To achieve this, the NoC needs to supply the high bandwidth data needed to sustain peak performance. Once processing is complete, the NoC facilitates the safe shutdown of unneeded power domain sections. In conjunction with the SoC power controller, the NoC interconnect is an active participant in lowering active power.
The Arteris’ FlexNoC interconnect IP, for example, does not reside in its own dedicated power domain. Instead, it resides within the power domains of other IPs forming the SoC. The NoC enables the automated creation and configuration of power domain adapters and asynchronous clock domain crossings within the interconnect logic itself.
Proper handling of IP shutdown and disconnect is critical from an interconnect point of view. While messages can continue between active units, they must be prevented from reaching powered-down IP blocks. Ensuring safe power-down processes allows programmers to maximize power savings, which is crucial for ultra-low power applications, such as those in complex chips and edge devices designed to extend battery life.
With efficient power domain management, clock tree power is the main factor in overall power consumption and heat generation. While logic gates contribute to leakage power, clock tree power is largely determined by the activity of register flip-flops (flops) and wire capacitance. Effective management of these elements is essential for optimizing performance and thermal efficiency in high-performance computing, data centers and advanced consumer devices.
Wires require buffer cells to drive signals effectively. Each wire needs a large buffer approximately every 100 microns to handle increasing RC effects. Therefore, it is important to architect the design with minimal register state redundancy and to reduce long-wire routing in the floorplan for any given SoC function.
Clock Gating
Another key design option that greatly reduces both clock tree and register power is clock gating. This method prevents clock wires from toggling twice per clock cycle by detecting when the registers driven by a specific portion of the clock tree will remain inactive.
An advanced NoC interconnect IP supports three levels of clock gating, which combine to minimize power consumption significantly. At the cell level, synthesis can apply clock gating to prevent the toggling of clock tree leaves by gating flops. This form of clock gating logic is localized within small groups of flops, and its application is limited by the area overhead of additional clock gating logic.
The NoC can also gate entire branches of the clock tree at a per-unit level. Each unit, when not processing a packet, is completely clock-gated. In this case, only those parts of the NoC that are processing data receive clock edges.
These techniques cover the majority of the clock tree, but they are automatic and outside of the control of software. At the highest level, software can take advantage of units that support software-controlled clock gating to stop the entire clock tree to large portions of the NoC. This does not have the benefit of eliminating leakage power, as do unit power-down methods, but enables much faster wake-up.
“On-the-fly” clock gating activates within a single clock cycle, so there is no impact on throughput or latency. This allows the majority of the interconnect logic to be gated off, even during high-demand operations, such as when a processor is heavily accessing DRAM. These features greatly reduce both clock tree and register power consumption.
Leakage power consumption increases with place-and-route (P&R) wire congestion, which leads to higher wire capacitance per cell and requires the use of high-speed, high-leakage cells. NoC interconnect logic, in contrast, is distributed, eliminating congestion points and reducing leakage power consumption. These features in NoCs greatly reduce overall SoC power consumption.
Performance, Bandwidth and Latency
NoC technology allows designers to optimize SoCs for three often-competing metrics: multi-gigahertz frequencies, controllable latency across all NoC IP connections and scalable bandwidth by individual trace. Unique aspects of NoC technology enable this optimization as follows:
Packetized communications: Transaction requests and responses are packetized at IP interfaces, enabling a NoC to run at higher frequencies. This also allows greater data capacity per wire compared to traditional interconnect architectures with centralized or less flexible designs. The maximum frequency is limited only by the physical constraints of fab process technologies.
Scalable bandwidth: Having exactly the bandwidth required for a particular application and mode is a highly desirable feature. This involves designing the chip to use minimum wires, gates and register hardware resources to achieve a throughput goal.
Some systems have low bandwidth interconnect requirements, which can be satisfied by 8- or 16-bit connections. Most systems, however, require higher bandwidth, which can be addressed by 32-, 64- and 128-bit connections. High-performance systems demand very high bandwidths and may utilize links as wide as 1024 bits.
As an example, connections running at 2GHz over 1024-bit links provide more than 2 terabytes of bandwidth per second. This level of throughput is critical for high-end applications such as gaming, automated driving assistance systems (ADAS), AI accelerators and data center SoCs. Many high-end SoCs feature a mix of bandwidth requirements, all of which can be satisfied when using appropriate NoC technology.
Controllable latency: The Arteris NoC interconnect needs one cycle or less for packetization to initiate transactions on its efficient transport network. The latency introduced by packetization logic is minimal and offers significant benefits by enabling a high-speed transport network.
For latency-sensitive connections, zero-latency NoC capability places the packet header in parallel with the payload, eliminating any header latency penalty. This capability is used selectively for individual latency-sensitive connections.
There are also latency-insensitive connections, such as input/output (I/O) paths used primarily during SoC bring-up. For these types of connections, latency is less critical, allowing for wire conservation. For example, an 8-bit path can process packet headers and payloads over several cycles, trading latency for wire conservation. An effective NoC interconnect IP supports a wide range of trade-offs between frequency, bandwidth and wire utilization.
Die Area
Because NoC interconnects serialize communication into configurable bus widths, they can route large packet throughput on a small number of wires. Bandwidth is flexible and managed by configuring the appropriate bit width from 8 bits for I/O IPs to 1024 bits on the high end. In the Arteris NoC, packetization takes place at IP interfaces, which are the edges of the interconnect network, so that the core logic is simple and fast. This also conserves gates.
Compared to hybrid buses, NoC interconnects reduce total wire length by about 50% and use 30-50% fewer gates. Interconnect IP typically represents 8-12% of SoC area. However, if poorly designed, it can consume over 30% of SoC power.
NoC technology offers significant die size reductions compared to hybrid buses or other inefficient interconnect implementations. Saving 50% of interconnect IP area can save several square millimeters of silicon area in typical SoCs.
For a hypothetical example, at a 28nm process node, if each 1mm2 of silicon costs around 10 cents, saving just 3mm2 would result in a manufacturing cost saving of 30 cents per chip. At a volume of twenty million units, $6M would be saved over the production life of an SoC. Across a family of five such SoCs, the savings could amount to $30M or more, depending on the volumes. This example illustrates why NoC Interconnect IP has been rapidly adopted by the majority of leading SoC design teams.
State of the Art in Interconnect IP Technology
Not all NoC interconnects are created equal. This is a complex technology that requires partner participation, multi-disciplinary product development, and substantial time and capital to meet the needs of ever-evolving SoC designs. The process involves designing and refining the NoC product, proving its success in initial projects, integrating it into end-user systems and ramping up to high-volume production.
Moreover, maintaining flexibility, comprehensive understanding and routine releases are essential to meet the evolving requirements of leading-edge SoC design teams and their architectures.
For example, many of today’s SoCs include arrays of processor clusters (PCs), each containing multiple processor cores, or special neural processor unit (NPU) IPs, which contain arrays of processing elements (PEs). Arteris NoC environments support soft tiling, enabling users to create a single processing unit (PU), such as a PC or a PE, and specify the desired array size by defining its number of rows and columns.
The configuration tool then automatically replicates the PU to create the specified array and generates the appropriate coherent or non-coherent mesh topology NoC. It also configures the network interface units (NIUs) for each array element. This automated process dramatically increases productivity while virtually removing the potential for errors.
Until relatively recently, NoC layout and implementation were largely manual. This process required significant effort to create constraints for the physical placement of NoC elements. This typically resulted in numerous iterations of pipeline insertions along with lengthy NoC P&R iterations to achieve the SoC’s PPA goals.
The latest generation of Arteris NoCs includes physical awareness, enabling automatic NoC generation and pipeline stage insertion. This minimizes iterations and delivers a correct-by-construction NoC, accelerating the backend physical design process. As a result, the physically optimized NoC IP is ready to be seamlessly handed over to the backend team for physical implementation.
Conclusion
NoC interconnect IP offers myriad advantages for SoC power, performance, and area. Arteris NoC technology, with its extensive feature set, automation, and proven efficiency, continues to play a vital role in enabling cutting-edge designs, meeting the challenges of increasing complexity, and driving the future of SoC innovation.