Cache Coherent Interconnect

read-more

What is Cache Coherent Interconnect?

Cache coherence ensures that all processing elements (PE) within a System-on-Chip (SoC) maintain a consistent view of memory. If one PE changes a shared memory location, all other PEs that share the same memory are notified, and their local caches are updated.

The Arteris Ncore product is a cache-coherent network-on-chip (NoC) interconnect. It efficiently connects multiple SoC IP components using a standard industry-standard protocol and enables data coherence between processing elements such as CPU, GPU, and accelerators. Ncore uses the AMBA coherent protocols from Arm, namely CHI (Cache Coherent Hub Interface) or ACE (AXI Coherency Extensions).

Arteris Ncore supports heterogeneous multi-processor SoCs, interfacing seamlessly with coherent and non-coherent domains to improve performance, reduce latency, and maintain data integrity in complex computing systems.

Contrasting Cache-Coherent and Non-Coherent Interconnects:

Cache-Coherent Interconnect:

  1. Data Coherence: Cache-coherence ensures shared memory remains consistent and up to date across all PEs.
  2. Synchronization: Ncore uses industry-standard coherence protocols such as AMBA CHI, ACE, or ACE-light instead of software coherency.
  3. Scalability: Cache-coherent NoCs such as Ncore implement a directory-based snoop filter mechanism where multiple snoop filters can be dedicated to PE caches.
  4. Latency: Ncore uses hardware coherency to minimize latency associated with cache maintenance operations.
  5. Complexity: Although Ncore is more complex than a non-coherent NoC, it will reduce the complexity and verification overhead associated with software-managed coherency.

Non-Coherent Interconnect:

  1. Data Incoherence: Non-coherent interconnects do not guarantee data consistency between caches. If one processing element writes to a memory location, there is no automatic mechanism to inform other processing elements of the update.
  2. Synchronization: Without cache coherence, developers have more control over PE synchronization, potentially achieving better performance.
  3. Scalability: When dealing with many PEs, each PE may operate independently, reducing contention for system resources.
  4. Latency: Because they don’t incur the overhead of cache coherence, non-coherent NoCs can benefit applications that require highly low-latency responses.
  5. Simplicity: Non-coherent interconnects are potentially more straightforward to design and implement than cache-coherent interconnects.

Cache-Coherent NoC Benefits:

  • Automatically and efficiently maintains cache consistency between PEs.
  • Provides an interface to Non-Cache coherent PEs for them to be IO coherent.
  • Simplifies software development by using hardware cache management.

Non-Coherent NoC Benefits:

  • Non-coherent NoC provides more fine-grained control over memory synchronization, potentially leading to optimized performance in specific use cases.
  • Can be more suitable for real-time or low-latency applications due to lower-latency memory access times.
  • Use more straightforward hardware library elements leading to lower PPA.

Designers must consider design goals and performance requirements when selecting the appropriate interconnect strategy. Some parts of an SoC may benefit from cache coherence, while others prioritize low latency, scalability, or fine-grained synchronization control.

Why is Cache Coherent Interconnect Important?

Modern heterogeneous multi-processor SoC designs may use Cache-coherent NoCs for the following reasons:

  1. Data Consistency: Each PE (CPU core or accelerator) uses cache memory to improve data access speed.
  2. Performance Optimization: is achieved by reducing the need to access main memory for shared data.
  3. Simplified Software Development: Means software developers can write multi-threaded or multi-process applications without the need to consider managing data in shared caches.
  4. Lower Latency: Cache coherence can reduce memory access latency as PEs read memory locations that are already in another core’s cache; data is transferred much faster than access to main memory.
  5. Scalability: As SoCs become more complex and include more PEs, hardware managed cache coherence becomes increasingly important.
  6. Compatibility: Software applications and operating systems assume cache coherence, and cache-coherent interconnects ensure compatibility with existing software ecosystems.
  7. Reliability: Hardware coherency avoids data corruption, stale data, and race conditions that can lead to system crashes or instability.

How Does Cache Coherent Interconnect Work?

Cache-coherent interconnects implement protocols and mechanisms to ensure that multiple processing elements (such as CPU cores) and their associated caches maintain a coherent and consistent view of shared memory. The fundamental principles include:

  1. Cache State Tracking: Each cache line in the system has a status that may include any of “Modified,” “Exclusive,” “Shared,” and “Invalid.”
  2. Cache-to-Cache Communication: When a PE writes to a shared memory location, this is automatically communicated to other caches that have a copy of the data.
  3. Directory-Based Protocols: Directory-based protocols like Arteris Ncore use a centralized directory that tracks the location and status of cached data. When a write occurs, the directory is updated, and only caches with a copy of the data are notified.
  4. Snooping: Caches are monitored for memory accesses and address changes that might affect their data. If a cache detects a potential conflict, it automatically updates or invalidates its data.
  5. Coherence Enforcement: The cache coherence protocol enforces rules to ensure that caches and processing elements respond appropriately to read and write requests. These rules include ensuring that a read from one cache sees the most recent write from another and prevents multiple caches from modifying the same data simultaneously.
  6. Write Propagation and Invalidation: When a cache updates a memory location, it must notify other caches that might have a copy of that data by propagating the write operation, or by invalidating the copies in other caches thus maintaining data consistency.
  7. Atomic Operations: Operations such as atomic read-modify-write (e.g. where a counter value is incremented) are executed across multiple caches.

Cache Coherent Interconnect With Arteris

Arteris Ncore Cache coherent interconnect provides several advantages for design teams working on heterogeneous cache coherent systems:

  1. Heterogeneous Cache Coherency: Ncore supports true heterogeneous cache coherency, allowing system architects to tailor the NoC to the specific requirements of each PE to ensure that the SoC design meets performance, power, and area requirements.
    • Heterogeneous cache coherent agents can vary significantly regarding coherence models, protocols, physical attributes, and workload behavior. Ncore enables heterogeneous cache coherency by accommodating different coherence protocols through a flexible coherence messaging layer. It also allows optimizing resources like transaction tables and snoop filters based on agent behavior and implementation.
    • Multiple snoop filters. Snoop filters can be assigned to one or more similar PEs, sharing properties and behaviors like cache size or workload, saving die area while efficiently tracking state.
  1. Highly Scalable Systems: Ncore simplifies scaling the interconnect according to transaction processing and data bandwidth needs. The number of components and ports per component can be adjusted to meet performance goals without wasting resources.
  2. High-performance links to non-coherent parts of the SoC:
    • Proxy caches enable non-coherent PEs to operate IO coherently within the coherent subsystem, reducing the need for communication through DRAM and decreasing power consumption and latency.
    • A combination of proxy caches and non-coherent bridges provides designers with flexibility in utilizing legacy IP while enhancing performance. Most existing non-coherent IP cores, including DSPs, codecs, I/O subsystems, and storage controllers, can efficiently communicate within the coherent subsystem.
    • Coherent and non-coherent subsystems layer on a high-performance transport interconnect, allowing architects to optimize the transport interconnect after specifying the coherent system functionally.
  1. Lower Power Consumption: Ncore goes against the trend by enabling architects to define multiple clock domains within the interconnect. This feature allows coherent agent interfaces to operate at the same clock speed and voltage as the attached IP, reducing power consumption.
  2. Easier Chip Layout: Ncore network interfaces are compact and flexible, allowing architects to place them near their associated IP blocks. This optimizes the interconnect area, and enabled each component an appropriate number of ports for bandwidth needs.

Resources

We chose the Arteris Ncore cache coherent interconnect because of its unique proxy caches and their ability to underpin high-performance, low power, cache coherent clusters of our unique AI accelerators. And with our prior experience using FlexNoC and the FlexNoC FuSa Option for functional safety, we trust Arteris to be the highest performing and safest choice for ISO 26262-compliant NoC IP.

Elchanan Rushinek, Vice President of Engineering, Mobileye

PDF Thumbnail

Making Cache Coherent SoC Design Easier with Ncore White Paper

This white paper discusses the challenges and solutions in designing cache-coherent System-on-Chip (SoC) architectures, particularly in the increasing complexity of modern SoCs with diverse processing elements.

Read More Read More