Cache Coherent and Non-Coherent NoCs Connect AI and HPC SoCs and Chiplets
Customer Overview
Tenstorrent’s high-performance RISC-V CPUs, modular chiplets, and scalable compute systems give developers full control at every layer of the stack, at any scale from a single-node experimentation to data center-scale deployment. All of their hardware is supported by open source full stack software supporting 20+ customer models running at top speeds and 700+ models running out of the box enabled by the compiler TT-Forge.
Tenstorrent believes in an open future. Their architecture and software are designed to be edited, forked, and owned. Tenstorrent’s products are gaining traction and momentum in US, European, Asian, and Middle East markets with those building next-gen and sovereign AI solutions.
Tenstorrent designs chiplets: smaller, specialized, and reusable silicon building blocks. This modularity offers advantages in terms of cost, time-to-market, and the ability to mix and match different technologies. Chiplets are optimal for plug-and-play components across technologies and vendors unlocking real innovation. This vision of a “composable” hardware future is being driven by Open Chiplet Architecture (OCA), a standard that ensures interoperability between chiplets from different vendors.
The fundamental building blocks of Tenstorrent’s architecture are Tensix AI Cores and high-performance Ascalon RISC-V CPU cores. Tensix Cores are highly programmable and are designed to efficiently execute the complex mathematical operations at the heart of AI models. Ascalon CPUs provide the high-performance, general-purpose compute capabilities necessary to run operating systems and manage workloads. Tenstorrent implements these cores as its foundational IP. It then uses multiple instances of its foundational IP, plus other IP blocks to construct its own chiplets for compute, memory, and I/O.
Overview
Tenstorrent’s high-performance RISC-V CPUs, modular chiplets, and scalable compute systems give developers full control at every layer of the stack, at any scale from a single-node experimentation to data center-scale deployment. All of their hardware is supported by open source full stack software supporting 20+ customer models running at top speeds and 700+ models running out of the box enabled by the compiler TT-Forge.
Tenstorrent believes in an open future. Their architecture and software are designed to be edited, forked, and owned. Tenstorrent’s products are gaining traction and momentum in US, European, Asian, and Middle East markets with those building next-gen and sovereign AI solutions.
Tenstorrent designs chiplets: smaller, specialized, and reusable silicon building blocks. This modularity offers advantages in terms of cost, time-to-market, and the ability to mix and match different technologies. Chiplets are optimal for plug-and-play components across technologies and vendors unlocking real innovation. This vision of a “composable” hardware future is being driven by Open Chiplet Architecture (OCA), a standard that ensures interoperability between chiplets from different vendors.
The fundamental building blocks of Tenstorrent’s architecture are Tensix AI Cores and high-performance Ascalon RISC-V CPU cores. Tensix Cores are highly programmable and are designed to efficiently execute the complex mathematical operations at the heart of AI models. Ascalon CPUs provide the high-performance, general-purpose compute capabilities necessary to run operating systems and manage workloads. Tenstorrent implements these cores as its foundational IP. It then uses multiple instances of its foundational IP, plus other IP blocks to construct its own chiplets for compute, memory, and I/O.
Challenge
The Challenge
One of the challenges Tenstorrent faces lies in managing the data traffic generated by AI workloads within and across chiplets. As the company looks to incorporate next-generation memory standards like GDDR7, which promise improved bandwidth, the demands on the internal fabric of the chiplets will be high. The GDDR7 specification, with its high-speed PAM-3 signaling, requires pristine integrity and a meticulously designed physical interface (PHY).
When the Tenstorrent team originally set out to build the highest-performing AI solutions available, they decided to base their designs on open-source solutions, develop differentiating IP in-house, and employ trusted third-party IP. Key to this approach is highly configurable cache coherent and non-coherent Network-on-Chips (NoCs) that scale up and down as required.
Business Challenges
- To build "computers for AI" aimed at delivering unprecedented performance, scalability, and flexibility across applications including HPC/AI, automotive, and robotics.
Design Challenges
- To develop high performance intellectual property (IP) cores, including RISC-V CPU cores and Tensix AI Cores, and chiplets (building blocks of SoCs)
- To connect multiple different types of IP within its chiplets to handle the immense data traffic generated by modern AI workloads
Arteris Solution
Results
- FlexNoc fully addressed Tenstorrent’s non-coherent NoC requirements in its current generation of chiplets.
- Ncore addressed Tenstorrent’s cache coherent NoC requirements in next generation of chiplets in planning.
Solution
The Solution
Tenstorrent required mature, high-bandwidth low-latency fabrics that were stable and silicon-proven. Arteris offers Ncore cache coherent and FlexNoC non-coherent NoC IPs, both of which can be easily configured to address the company’s requirements. These NoCs support the customizability needed for various use cases, from automotive AI to data center deployments, ensuring large volumes of data are moved quickly with minimal latency.
Results and Future Plans
Results and Future Plans
The Arteris FlexNoC non-coherent NoC IP fully addressed the non-coherent requirements for Tenstorrent’s compute, memory, and I/O chiplets. In addition, Tenstorrent is planning to deploy Arteris Ncore coherent NoC in its next generation chiplets.
Using Arteris FlexNoC, Tenstorrent’s memory chiplet can meet streaming read / write bandwidth expectation at 144GB/s as specified in the JEDEC GDDR7 memory specifications.
Figure 1: FlexNoC for Tenstorrent’s chiplet Enlarge Link
Based on the Tenstorrent team’s experience with Arteris technologies and support from Arteris, the company intends to use Arteris technology in future generations of products on its roadmap.
| Scenario | Source | Target | BW Expectation | Utilization |
|---|---|---|---|---|
| Streaming Read | Compute Chiplet Facing AXI Port 0 | DRAM | 144GB/s read data | 98%~100% |
| Streaming Write | Compute Chiplet Facing AXI Port 0 | DRAM | 144GB/s write data | |
| Streaming Read | Compute Chiplet Facing AXI Port 1 | DRAM | 144GB/s read data | |
| Streaming Write | Compute Chiplet Facing AXI Port 1 | DRAM | 144GB/s write data |
Figure 2. Streaming read / write bandwidth expectations
Case Studies
Browse Case Studies