How is SSD flash memory like a helicopter?
, Feb 03, 2016
It destroys itself as it is operating.*
Bear with me now as I explain how SSD flash memory works similar to helicopters…
First, let’s explain the helicopter part of this analogy: Fixed wing aviators (i.e. non-helicopter pilots) are keen to remind people that the act of converting a jet engine’s axial force into the orthogonal axis required to spin a helicopter rotor requires a sophisticated gearbox that slowly (and sometimes not so slowly) grinds itself to death through friction.
That’s why helicopters require so much preventative maintenance compared to a normal airplane: The helicopter is in essence eating itself to death, and without proper maintenance it will take you with it.**
Now for the second part of this analogy.
A solid state drive’s (SSD) flash memory is in a similar state as a helicopter: Used properly, it still requires a ton of preventative maintenance to ensure endurance and data reliability. This “preventative maintenance” has to be very sophisticated and requires complex processing to meet enterprise standards for data protection. This is primarily for flash cell wear-leveling and error correction, but also to meet unique environmental requirements (usually thermal) and throughput vs. latency requirements. (A good overview is at Enterprise versus Client SSD)
The famous “Five 9s reliability” is for wimps: Key metrics for enterprise SSD reliability are the Uncorrectable Bit Error Rate (UBER) (around 1 sector per 10^17 bits read in modern systems) and the Mean Time Between Failures (MTBF) (2,000,000 hours or more). This is much more reliable than the “high availability” metrics we learned about in grad school and much more than what is required for consumer SSDs.
How do you get this kind of reliability?
You design it into the hardware.
That’s why so many enterprise SSD controller design teams have chosen Arteris FlexNoC with the FlexNoC Resilience Package as their on-chip interconnect IP. Built-in data protection using ECC, advanced data checking and (when you need it) hardware redundancy are key to ensuring errors are caught before they affect system reliability.
BTW, here’s a parting helicopter joke:
The basic difference between an airplane and a helicopter is that an airplane *wants* to fly.
* The reference to helicopters is based on my experience in my other “job” as a reserve officer in the US Air Force and the times I spent in Sikorsky MH-53 Pave Low special ops helicopters getting hot oil dripped down the back of my flight suit from the leaky overhead gearbox.
** Some trivia: A Sikorsky UH-60 Blackhawk gas turbine generator’s (“jet engines'”) turbines rotate at 44,700 RPM (this is called Ng). This 44,700 RPM axial rotation is converted by a fancy gearbox to a rotor rotation of approximately half this speed (20,900 RPM, called Nr). In addition, the gearbox drives the tail rotor, power generator and other things. This complicated gearbox is what consumes itself over time, and must be treated with kid gloves (No sand in the gearbox oil, please!). Here’s an excellent animation: https://www.youtube.com/watch?v=F9wnzBaE24s