SMP, Asymmetric Multiprocessing, and the HSA Foundation

by Kurt Shuler, On Sep 27, 2012

(For more information on SMP’s inability to scale well, read Jack Ganssle’s 2008 embedded.com article, “The Nulticore effect,” or the IEEE Spectrum/Sandia Labs article, “Multicore is Bad News for Supercomputers: Adding cores slows data-intensive applications.”)

Processor companies serving the mobility and consumer electronics markets have avoided purely SMP solutions and instead have implemented asymmetric multiprocessing (AMP) architectures. An example of AMP is a mobile phone modem baseband SoC which contains an ARM processor and a DSP to handle control and signal processing, respectively. We also see AMP architectures in today’s mobile phone application processors, which usually have multiple CPU cores and separate discrete graphics cores, video cores, audio cores and imaging cores.

Battery Size and Heat Drive Asymmetric Multiprocessing in Mobility Devices

The mobility world has always been forced to use “the best core for the job” because of the constraints imposed by battery size and heat dissipation.  So architectures in mobility have always been created from a baseline expectation of heterogeneous core AMP.qualcomm snapdragon s4 block diagram 300px

This is in contrast to the server and PC markets which have relatively unlimited (at least compared to a mobile phone) power consumption and heat dissipation capabilities. In these markets, it has always been easier to add more cores of the same type, connect them using cache coherency, and reuse the legacy software to run on top.

Things are starting to change, though, as the SMP approach starts to wear thin. For example, for server farms that power the likes of Google and Facebook, power consumption and heat dissipation have become huge cost and environment issues. And in the PC space, we have run into a “GHz wall” where the only way to have a step function increase in performance is to have different cores optimized for different workload types.

Why hasn’t AMP been implemented in the PC and server markets?

It’s hard.

In mobility designs, each heterogeneous processing core, whether graphics, audio, DSP, etc., usually has a custom firmware and software stack associated it. This software must be integrated to communicate with the CPU cores’ operating system, which necessitates coding work in the OS hardware abstraction layer and drivers.

Furthermore, these heterogeneous cores do not have a single view of system memory, so complicated synchronization schemes are usually implemented in hardware and software. Context switching and preemption are difficult to implement.

And most importantly, each of these cores requires an expert programmer to code it, someone conversant in a particular core’s instruction set and tool chains.

As a result, asymmetric multiprocessing has thrived in the relatively closed-to-developers/ISVs mobility and consumer electronics worlds while SMP has flourished in the wide open world of PCs and servers.

The Heterogeneous System Architecture Foundation

The HSA Foundation is a non-profit organization that intends to make it easier for the world to adopt AMP architectures.

Its goals are to:

  • Make heterogeneous programming easy and a first-class pervasive complement to CPU computing
  • Continue to increase the power efficiency of heterogeneous systems (AMP), keeping it the platform of choice from smartphones to the cloud
  • Bring to market strong development solutions (tools, libraries, OS runtimes) to drive innovative advanced content and applications
  • Foster growth of heterogeneous computing talent through HSA developer training and academic programs to drive both learning and innovation

To achieve these goals, HSA will have to innovate by providing a technical framework and architecture to address the following issues:

  • Unified Programming Model – Today, CPU and GPU (or other accelerator) cores are programmed separately, with the GPU treated as a remote processor. HSA will allow developers to target the CPU or GPU by writing in task-parallel languages, like the ones they use today when writing for multicore CPUs.HSA solution stack 300px
  • Unified Address Space – HSA supports virtual address translation amongst the heterogeneous cores with an HSA-specific memory management unit (HMMU). HSA compute engines will use the same pageable virtual address space as used by CPUs today.
  • Queuing – CPUs, GPUs and other cores can queue tasks to each other and to themselves through an HSA runtime. Queuing can be managed in hardware to avoid OS system calls and enable very low latency communication between cores.
  • Preemption and Context Switching – HSA enables job preemption, job scheduling and fault handling capabilities to overcome potential problems created by rogue or faulted processes.

How will HSA do this?

HSA’s goals and the issues it has chosen to address are admirable, but are difficult to achieve. In my next article I’ll discuss the means by which the HSA Foundation will simplify heterogeneous asymmetric processing. Specifically, I’ll introduce the HSA solution stack, comprising the HSA Assembler, Runtime, Finalizer, and Kernel Driver as well as HSA software libraries and intermediate languages.

Sources

—Kurt Shuler is vice president of marketing at Arteris.

 

Learn how you can invest in better SoC IP technology today:

SUBSCRIBE TO ARTERIS NEWS