# Partitioned Real-Time NAND Flash Storage

Katherine Missimer and Rich West



## Introduction





# Introduction













### Introduction













#### Sensors on Boss



# NAND Flash Memory

Non-volatility Shock resistance

Low power consumption



Fast access time

# NAND Flash Memory

Non-volatility Shock resistance

X

No in-place updates

X

Low power consumption

 $\checkmark$ 

Fast access time

Reads & writes operate at different granularity than erasures



#### **Flash Block**









# Observations



#### **NAND Flash Chips**



#### Write latency for 4 pages

| Same chip        | 9.70 msec |
|------------------|-----------|
| Way interleaving | 2.90 msec |
| Channel striping | 2.37 msec |

#### Erasure latency for 4 blocks

| Same chip        | 16.2 msec |
|------------------|-----------|
| Way interleaving | 4.06 msec |
| Channel striping | 4.06 msec |

#### Read latency for 4 pages

| Same chip        | 1.44 msec  |
|------------------|------------|
| Way interleaving | 1.25 msec  |
| Channel striping | 0.382 msec |









Time





















Parameters for a task: 
$$[(r, T_r), (w, T_w)]$$
  
# pages in read # pages in write  
read request period write request period

Parameters for a task: 
$$[(r, T_r), (w, T_w)]$$
  
# pages in read # pages in write  
read request period write request period

A task does not account for the CPU computation time. These tasks exist on the flash translation layer and utilize the NAND bus.

Parameters for a task: 
$$[(r, T_r), (w, T_w)]$$
  
Read capacity: Write capacity:  
 $C_r = r \cdot (t_r + t_d)$   $C_w = \lceil \frac{w}{|F_w|} \rceil \cdot t_w$   
 $\#_{pages in time to to time to read request read a page decode a page}$ 

Parameters for a task: 
$$[(r, T_r), (w, T_w)]$$
  
Read capacity: Write capacity:  
 $C_r = r \cdot (t_r + t_d)$   $C_w = \begin{bmatrix} \frac{w}{|F_w|} \\ \frac{w}{|F_w|} \end{bmatrix} \cdot t_w$ 

write request

26

a page

chips





# Encoding task when w > 0:



# Garbage collection task when w > 0:



1. Write 3 flash pages of Logical Page Numbers (LPN): 0, 1, 2

Page-level Mapping Table

| LPN | PPN  |
|-----|------|
| 0   | 4000 |
| 1   | 4001 |
| 2   | 4002 |
| 3   |      |

#### Block 1000 (data)

| PPN  | data |
|------|------|
| 4000 | Х    |
| 4001 | У    |
| 4002 | Z    |
| 4003 |      |

- 1. Write 3 flash pages of Logical Page Numbers (LPN): 0, 1, 2
- 2. Update LPN=0

Page-level Mapping Table

| LPN | PPN  |
|-----|------|
| 0   | 4003 |
| 1   | 4001 |
| 2   | 4002 |
| 3   |      |

#### Block 1000 (data)

| PPN  | data           |
|------|----------------|
| 4000 | <del>-X-</del> |
| 4001 | У              |
| 4002 | Z              |
| 4003 | X,             |

- 1. Write 3 flash pages of Logical Page Numbers (LPN): 0, 1, 2
- 2. Update LPN=0
- 3. GC triggered to reclaim Block 1000

Page-level Mapping Table

| LPN | PPN  |
|-----|------|
| 0   | 4003 |
| 1   | 4001 |
| 2   | 4002 |
| 3   |      |

#### Block 1000 (data)

| PPN  | data           |
|------|----------------|
| 4000 | <del>-X-</del> |
| 4001 | У              |
| 4002 | Z              |
| 4003 | X,             |

Block 2000 (free)

| PPN  | data |
|------|------|
| 8000 |      |
| 8001 |      |
| 8002 |      |
| 8003 |      |

- 1. Write 3 flash pages of Logical Page Numbers (LPN): 0, 1, 2
- 2. Update LPN=0
- 3. GC triggered to reclaim Block 1000
- 4. Copy valid pages in victim block to a free block

Page-level Mapping Table

| LPN | PPN  |
|-----|------|
| 0   | 8002 |
| 1   | 8000 |
| 2   | 8001 |
| 3   |      |

#### Block 1000 (data)

| PPN  | data           |
|------|----------------|
| 4000 | <del>-X-</del> |
| 4001 | У              |
| 4002 | Z              |
| 4003 | X,             |

Block 2000 (free)

| PPN  | data |
|------|------|
| 8000 | У    |
| 8001 | Z    |
| 8002 | X'   |
| 8003 |      |

- 1. Write 3 flash pages of Logical Page Numbers (LPN): 0, 1, 2
- 2. Update LPN=0
- 3. GC triggered to reclaim Block 1000
- 4. Copy valid pages in victim block to a free block
- 5. Erase victim block

Page-level Mapping Table

| LPN | PPN  |
|-----|------|
| 0   | 8002 |
| 1   | 8000 |
| 2   | 8001 |
| 3   |      |

#### Block 1000 (data)

| PPN  | data |
|------|------|
| 4000 |      |
| 4001 |      |
| 4002 |      |
| 4003 |      |

Block 2000 (free)

| PPN  | data |
|------|------|
| 8000 | У    |
| 8001 | Z    |
| 8002 | X,   |
| 8003 |      |

# Garbage collection task when w > 0:

Over-provisioning

Logical Address Space

FTL address mapping

**Physical Address Space** 

# Garbage collection task when w > 0:

#### Over-provisioning

Logical Address Space

FTL address mapping

**Physical Address Space** 

GC victim block

Invalid page Invalid page

# Garbage collection task when w > 0:

 $C_{gc}={
m time}$  spent copying valid pages and erasing the victim block

 $T_{gc}=$  time before the next block needs to be reclaimed

#### Admission Control

Read set:

$$rac{t_r}{min\_T_r} + \sum_{i=1}^n rac{C_r^i}{T_r^i} \leq 1$$

Write set:

$$rac{t_e}{min\_T} + \sum_{i=1}^n (rac{C_w^i}{T_w^i} + rac{C_{en}^i}{T_{en}^i} + rac{C_{gc}^i}{T_{gc}^i}) \le 1$$

#### **Admission Control**



#### Admission Control

Read set:



Write set:

time to erase a block

 $\frac{t_e}{\min_T} + \sum_{i=1}^n \left(\frac{C_w^i}{T_w^i} + \frac{C_{en}^i}{T_{en}^i} + \frac{C_{gc}^i}{T_{gc}^i}\right) \le 1$   $\lim_{w \to \infty} \frac{1}{\min_{w \to \infty}} \sum_{i=1}^n \left(\frac{C_w^i}{T_w^i} + \frac{C_{en}^i}{T_{en}^i} + \frac{C_{gc}^i}{T_{gc}^i}\right) \le 1$ 

### Admission Control Simulations

# of task sets: 500

# of tasks per task set: 10

Each task makes 1-page read and 3-page write requests per period

Periods are calculated based on generated utilizations



#### **OpenSSD** Cosmos Board



43

### Experimental Results: PaRT-FTL

8 tasks

- 4 tasks make write requests: 12-page writes every 60 msec
- 4 tasks make read requests: 3-page reads every 15 msec



# Experimental Results: WAO-GC

8 tasks

- 4 tasks make write requests: 12-page writes every 60 msec
- 4 tasks make read requests: 3-page reads every 15 msec







#### **Experimental Results**



# Conclusion

# Contributions of PaRT-FTL:

- An FTL design that takes advantage of internal parallelism in SSDs
- A real-time task model for read and write requests on multiple flash chips
- Bounded and low-latency read requests that are not blocked by write requests or garbage collection