Better 'lifetime predictions' for NAND flash memory

If you’ve ever used a smart phone, a flash drive or an external SSD drive for your computer, you’ve probably used NAND flash memory. NAND, shorthand for the boolean operator and logic gate “NOT AND”, is a popular memory and storage solution for solid-state drives (SSDs) and smart phones; laptop and desktop computers; digital cameras and audio players; video games; and scientific, industrial and medical electronics.

NAND offers fast erase and write times, and delivers density at a low cost per bit, while offering greater endurance than its competitor, NOR flash memory. However, nothing is forever. NAND flash memory storage is prone to wear, erasure, crosstalk and sensitivity that affect performance and reliability. There are only so many operable programming and erasing cycles NAND can run before it fails. But when will that happen to your device?

Being able to predict how long a flash memory storage system is likely to last before failure is obviously important. Such estimates are known as “lifetime predictions,” and machine learning models have been introduced to improve their reliability and accuracy. But resource requirements such as overhead allocation and frequency of running the predictions also must be considered, because conducting excessive prediction actions can lead to unnecessary resource consumption on the device.

New research published in IEEE Transactions on Computers by Professor Gang Qu (ECE/ISR) and his colleagues proposes techniques that can minimize redundant prediction operations by exploiting reliability variation.

ADLPT: Improving 3D NAND Flash Memory Reliability By Adaptive Lifetime Prediction Techniques was written by Qu; his former student Md Tanvir Arafin, (EE PhD 2018), an assistant professor in ECE at George Mason University; Zhaojun Lu, Qu’s former postdoctoral researcher, now assistant professor at Huazhong University of Science and Technology in Wuhan, China; Yuqian Pan, a Ph.D. student in cyber science and engineering at Huazhong; Haichun Zhang, a Ph.D. student in optical and electronic information at Huazhong; Zhenglin Liu, professor of optical and electronic information at Huazhong; and Haoming Zhang, a specialist in ASIC and large-scale FPGA design in Wuhan.

To effectively reduce the overheads of lifetime prediction effectively, the researchers have developed a prediction judging method based on erase duration and raw bit error number.

First, to explore reliability variation, the authors investigated the error distribution of different 3D flash chips, analyzing raw bit error rate (RBER) variation under two different kinds of stress: program-and-erase (P/E) cycling and data retention. The researchers noted that the features of RBER increasing after P/E cycling are similar among flash chips manufactured by the same vendor. They also found RBER distribution becomes wider when erase duration increases and the values of RBER measured only under P/E stress could not effectively reflect flash data retention capability.

Next, the authors proposed a prediction judgment method called “adaptive lifetime prediction techniques” (ADLPT), that reduces redundant prediction by 90 percent by exploiting reliability variation. ADLPT identifies necessary prediction by detecting the variation of erase duration and raw bit errors. The ADLPT model is trained according to the flash technology of a specific manufacturing vendor. This is because each vendor’s products are slightly different from their competitors.

The researchers found their ADLPT model minimizes redundant prediction operations and is an improvement in both prediction frequency and required memory space for metrics. Using ADLPT can improve the prediction performance of the static model from a 0.62 to 0.88 F1 score. This work will inspire further studies of NAND flash memory reliability and new techniques for building effective storage systems.

Published January 11, 2023