Cache Coherent Interconnect

read-more

What is Cache Coherent Interconnect?

A cache-coherent interconnect is a component within a System-on-Chip (SoC) or a network-on-chip (NoC) that facilitates communication and data coherence between multiple processing elements, such as CPU cores or accelerators, which have their own caches. Cache coherence ensures that all these processing elements see a consistent view of memory, meaning that if one processing element modifies a piece of data, all other elements accessing that data will see the same modification.

In a cache-coherent interconnect, the hardware and protocols are designed to manage and coordinate cache operations across different processing elements efficiently. It helps prevent issues like data inconsistencies, stale data, or race conditions that can occur when multiple caches hold copies of the same data.

Cache-coherent interconnects use various protocols, such as ARM’s CHI (Cache Coherent Hub Interface) or ACE (AXI Coherency Extensions), to maintain cache coherence and ensure that read and write operations to memory are performed correctly and consistently across all connected processing elements. This technology is crucial in modern multi-core and multi-processor SoCs to improve performance, reduce latency, and maintain data integrity in complex computing systems.

Contrasting Cache-Coherent and Non-Coherent Interconnects:

Cache-Coherent Interconnect:

  1. Data Coherence: Cache-coherent interconnects ensure that all caches in the system have a consistent and up-to-date view of shared memory. This means that when one processing element writes to a memory location, other processing elements will see the updated data without the need for explicit synchronization.
  2. Synchronization: Cache coherence simplifies synchronization between processing elements. Developers can rely on the coherency protocol to manage memory consistency, reducing the need for complex and error-prone synchronization mechanisms like locks and barriers.
  3. Scalability: Cache coherence can be challenging to implement in large-scale systems due to increased interconnect complexity and potential bottlenecks. However, directory-based protocols can help alleviate some scalability issues.
  4. Latency: Cache coherence may introduce some latency overhead, as caches need to communicate with each other to maintain coherence. However, this latency is often considered acceptable given the benefits of data consistency.
  5. Complexity: Implementing cache coherence adds complexity to the hardware and software stack. This complexity can increase design and verification efforts, potentially impacting time-to-market and development costs.

Non-Coherent Interconnect:

  1. Data Incoherence: Non-coherent interconnects do not guarantee data consistency between caches. If one processing element writes to a memory location, there is no automatic mechanism to inform other processing elements of the update. Developers must use explicit synchronization primitives (e.g., memory barriers) to ensure data consistency.
  2. Synchronization: Without cache coherence, developers have more control over synchronization, which can be advantageous in certain situations. They can finely tune synchronization mechanisms to match the specific needs of their application, potentially achieving better performance.
  3. Scalability: Non-coherent interconnects can be more scalable in certain scenarios, especially when dealing with a large number of processing elements. Each processing element operates more independently, reducing contention for interconnect resources.
  4. Latency: Non-coherent interconnects can offer lower latency for memory accesses because they don’t incur the overhead of cache coherence protocols. This can be beneficial in applications that require extremely low-latency responses.
  5. Simplicity: Non-coherent interconnects are simpler to design and implement compared to cache-coherent interconnects. This simplicity can lead to reduced design complexity, lower power consumption, and potentially lower costs.

Benefits of Cache-Coherent Interconnect:

  • Simplifies software development by providing a consistent and coherent memory model.
  • Eases multi-core and multi-processor programming, reducing the risk of data races and synchronization bugs.
  • Ensures data consistency without the need for complex application-level synchronization.
  • Suitable for applications where data consistency and ease of programming are top priorities.

Benefits of Non-Coherent Interconnect:

  • Offers more fine-grained control over memory synchronization, potentially leading to optimized performance in specific use cases.
  • Can be more scalable in systems with a large number of processing elements.
  • May provide lower latency for memory accesses, making it suitable for real-time or low-latency applications.
  • Simplifies hardware design and potentially reduces power consumption and costs.

In practice, the choice between cache-coherent and non-coherent interconnects depends on the specific requirements and trade-offs of the target application. Some systems may benefit from cache coherence, while others may prioritize low latency, scalability, or fine-grained synchronization control. It’s essential to carefully consider the design goals and performance requirements when selecting the appropriate interconnect strategy.

Why is Cache Coherent Interconnect Important?

Cache-coherent interconnects are important for several reasons in modern multi-core and multi-processor System-on-Chip (SoC) designs:

  1. Data Consistency: In a multi-core or multi-processor SoC, each processing element (CPU core or accelerator) often has its own cache memory to improve data access speed. Cache coherence ensures that all caches have a consistent and up-to-date view of memory. Without cache coherence, different processing elements might see different versions of the same data, leading to data inconsistencies and bugs.
  2. Performance Optimization: Cache coherence helps optimize system performance by reducing the need to access main memory for shared data. When one processing element updates a memory location, cache coherence allows other elements to be aware of this change, potentially reducing the need for expensive memory accesses.
  3. Simplified Software Development: With cache coherence, software developers can write multi-threaded or multi-process applications more easily. They don’t need to implement complex synchronization mechanisms to ensure data consistency across different processing elements. This simplifies software development and debugging.
  4. Lower Latency: Cache coherence can reduce memory access latency. When a processing element reads a memory location that’s already in another core’s cache, it can obtain the data more quickly than if it had to access main memory.
  5. Scalability: As SoCs become more complex and include a larger number of processing elements, cache coherence becomes crucial for managing data consistency across all these elements. It allows for the efficient scaling of SoC designs.
  6. Compatibility: Many modern software applications and operating systems assume cache coherence. Having cache-coherent interconnects ensures compatibility with existing software ecosystems and minimizes the need for custom software workarounds.
  7. Reliability: Cache coherence helps avoid potential issues like data corruption, stale data, and race conditions that can lead to system crashes or incorrect results.

How Does Cache Coherent Interconnect Work?

Cache-coherent interconnects work by implementing a set of protocols and mechanisms to ensure that multiple processing elements (such as CPU cores) and their associated caches maintain a coherent and consistent view of shared memory. The specific details of how cache coherence is achieved can vary depending on the architecture and interconnect technology used, but the fundamental principles include:

  1. Cache State Tracking: Each cache line in the system is associated with a state that indicates its status with respect to coherence. Common states include “Modified,” “Exclusive,” “Shared,” and “Invalid.”
  2. Cache-to-Cache Communication: When one processing element writes to a memory location, the cache coherence protocol ensures that the updated data is communicated to other caches that might have a copy of that data. This is typically done through signaling and control messages exchanged over the interconnect.
  3. Bus-Based or Directory-Based Protocols: There are two primary approaches to implementing cache coherence: bus-based and directory-based protocols.
    • Bus-Based Coherence: In a bus-based protocol, a shared communication bus connects all processing elements and caches. When a write operation occurs, the updated data is broadcasted on the bus, and all other caches monitor the bus to detect updates. This approach is simple but can become a bottleneck in large-scale systems.
    • Directory-Based Coherence: Directory-based protocols like Arteris Ncore use a centralized directory that tracks the location and status of cached data. When a write occurs, the directory is updated, and only the caches that have a copy of the data are notified. This approach is more scalable and can reduce bus contention.
  4. Snooping or Messaging: Caches monitor the interconnect for memory accesses and address changes that might affect their cached data. If a cache detects a potential conflict, it responds accordingly by either updating its own data or invalidating it.
  5. Coherence Enforcement: The cache coherence protocol enforces a set of rules to ensure that caches and processing elements respond appropriately to read and write requests. These rules include ensuring that a read from one cache sees the most recent write from another cache and preventing multiple caches from modifying the same data simultaneously.
  6. Write Propagation and Invalidation: When a cache updates a memory location, it must notify other caches that might have a copy of that data. This involves propagating the write operation or invalidating the copies in other caches to maintain data consistency.
  7. Atomic Operations: Cache coherence protocols often support atomic operations like atomic read-modify-write, which ensure that certain operations (e.g., incrementing a counter) are executed atomically across multiple caches.

Cache-coherent interconnects use a combination of these mechanisms to maintain data consistency and coherence in multi-core and multi-processor systems. The specific protocol used, whether it’s MESI, MOESI (as with Arteris Ncore), MESIF, or another variant, defines the precise rules and states that govern cache coherence behavior. These protocols are critical for ensuring that the memory hierarchy operates efficiently and that data is shared consistently among processing elements.

Cache Coherent Interconnect With Arteris

Arteris Ncore Cache coherent interconnect provides several advantages for design teams working on heterogeneous cache coherent systems. These benefits encompass customization for diverse system elements, scalability, performance, power efficiency, and ease of system layout and timing.

  1. True Heterogeneous Cache Coherency: Ncore is designed to support true heterogeneous cache coherency, allowing system architects to tailor the interconnect to the unique characteristics of the heterogeneous elements. This customization ensures that the system meets performance, power, and area requirements.
    • Heterogeneous cache coherent agents can vary significantly regarding coherence models, protocols, physical attributes, and workload behavior. Ncore enables heterogeneous cache coherency by accommodating different coherence protocols through a flexible coherence messaging layer. It also allows optimizing resources like transaction tables and snoop filters based on agent behavior and implementation.
    • Multiple snoop filters allow for heterogeneous coherency while conserving die area. Similar caching agents, sharing properties and behaviors like cache size or workload, can be associated with the same snoop filter, saving die area and efficiently tracking state.
  2. Highly Scalable Systems: Ncore simplifies scaling the interconnect according to transaction processing and data bandwidth needs. The number of components and ports per component can be adjusted to meet performance goals without wasting resources. This flexibility allows for adapting to evolving requirements and creating derivative chips based on the same design platform.
  3. Higher performance with non-coherent IP: Ncore’s non-coherent bridges and proxy caches facilitate efficient data sharing between non-coherent and coherent agents. The proxy caches enable non-coherent processing IPs to operate as equals within the coherent subsystem, reducing the need for communication through DRAM and decreasing power consumption and latency. The proxy caches also provide benefits like fetching cache lines, write-gathering, and data optimization for coherent memory accesses.
    • This combination of proxy caches and non-coherent bridges provides designers with flexibility in utilizing legacy IP while enhancing performance. Most existing non-coherent IP cores, including DSPs, codecs, I/O subsystems, and storage controllers, can efficiently communicate within the coherent subsystem.
    • Coherent and non-coherent subsystems layer on a high-performance transport interconnect, allowing architects to optimize the transport interconnect after specifying the coherent system functionally.
  4. Lower Power Consumption: Ncore goes against the trend by enabling architects to define multiple clock domains within the interconnect. This feature allows coherent agent interfaces to operate at the same clock speed and voltage as the attached IP, reducing power consumption.
  5. Easier Chip Layout: Ncore components are compact and flexible, allowing architects to place agents near their associated IP blocks. This distribution optimizes the interconnect, assigning each agent an appropriate number of ports for bandwidth needs.

Resources

We chose the Arteris Ncore cache coherent interconnect because of its unique proxy caches and their ability to underpin high-performance, low power, cache coherent clusters of our unique AI accelerators. And with our prior experience using FlexNoC and the FlexNoC FuSa Option for functional safety, we trust Arteris to be the highest performing and safest choice for ISO 26262-compliant NoC IP.

Elchanan Rushinek, Vice President of Engineering, Mobileye

PDF Thumbnail

Making Cache Coherent SoC Design Easier with Ncore White Paper

This white paper discusses the challenges and solutions in designing cache-coherent System-on-Chip (SoC) architectures, particularly in the increasing complexity of modern SoCs with diverse processing elements.

Read More Read More