Kurt Shuler, VP Marketing at Arteris IP, updates Bernard Murphy (SemiWiki), on some of the interesting ways AI is driving caching in this new SemiWiki blog:
How Should I Cache Thee? Let Me Count the Ways
September 25th, 2019 - By Bernard Murphy
Caching is well-known as a method to increase processing performance and reduce power by reducing need for repeated accesses to main memory. What may be less well-known is how varied this technique has become, especially in and around AI accelerators.
Caching intent largely hasn’t changed since we started using the concept – to reduce average latency in memory accesses and to reduce average power consumption in off-chip reads and writes. The architecture started out simple enough, a small memory close to a processor, holding most-recently accessed instructions and data at some level of granularity (e.g. a page). Caching is a statistical bet; typical locality of reference in the program and data will ensure that multiple reads and writes can be made very quickly to that nearby cache memory before a reference is made outside that range. When a reference is out-of-range, the cache must be updated by a slower access to off-chip main memory. On average a program runs faster because, on average, the locality of reference bet pays off.
You can learn more by visiting the Arteris IP Ncore Cache Coherent Interconnect IP webpage; http://www.arteris.com/ncore and the CodaCache Last Level Cache IP webpage; http://www.arteris.com/codacache-last-level-cache