[H-GEN] Flash drives and wear levelling: user experiences?

Joel Michael joel at gimps-r-us.com
Thu Jul 31 08:06:32 EDT 2008


Benjamin Fowler wrote:
> My question to the HUMBUG collective is; has anybody else gotten any 
> significant hands-on experience with working with cut-down machines 
> running off flash drives, and want to tell us their experiences, and 
> what they think is the best way to arrange the filesystem, etc to work 
> well with flash media?

(Warning: novel inside, but contains detailed explanations of how CF 
works, and how to tune file systems for it)

I have recently set up a couple of machines running off CF drives - one 
using a SATA-CF adapter that sits nicely in an expansion slot, and one 
that has a IDE-CF adapter built in (all-in-one machine based on a VIA 
Epia chipset).

I think most CF cards do internal wear levelling, but you'll never know 
until it actually fails, and you'll know you've got no wear levelling if 
your journal blocks are the ones that fail.  I'll tell you in a couple 
of years.  It is only writes that count towards the wear - you can keep 
reading the same data from flash forever with no ill effects.

However, the one thing to be VERY careful about is write speeds.  CF 
cards (in particular, and all flash drives in general) have real issues 
writing small amounts of data due to the way flash media implements a 
write operation.  In flash-land, a write isn't just a write, it doesn't 
just change the polarity of a little piece of disk.  Instead, a write is 
really a read of the existing data of the affected flash cell, an erase 
of the flash cell (a different flash cell if using wear levelling), and 
then the write of the entire flash cell.  This isn't so bad if you're 
just writing the contents of an entire flash cell at once, but if you're 
doing a write smaller than an entire flash cell then things get slow. 
If you attempt to write 1/2 an entire flash cell, you need to do 2 
read-erase-write cycles, and your overall transfer rate will be cut in 
half.  1/4 of an entire flash cell, you need to do 4 read-erase-write 
cycles, and you'll get 1/4 of the speed.  It's a linear relationship.

The problem is that manufacturers don't typically specify how big the 
flash cell size is!  On my 16GB (16039018496 byte) CF card, I've figured 
out that the flash cell size is 128KB (131072 bytes).  On my 4GB 
(4009549824 byte) CF card, the cell size is 64KB (65536 bytes).

I figured this out by running a loop of dd commands, writing the same 
known amount of data (from /dev/zero) using varying block sizes, and 
taking note of how long it took to write the data.  I wrote the data 
directly to the raw device (/dev/sdc) to not incur filesystem overheads, 
disabled I/O schedulers for the device (echo noop > 
/sys/block/sdc/queue/scheduler) and make sure that dd was using the 
O_DIRECT flag to write data (dd oflag=direct).

Every time the block size doubled, the write speed doubled, up until the 
cell size was reached.  Once the cell size was exceeded, the write speed 
stayed constant.

Now that I knew how big the flash cell size was, I had to lay out the 
disk appropriately, and tune file systems for it.  Laying out the disk 
was pretty easy, just start the first partition at 128KB offset (256 
sectors, due to 512-byte sectors).  Tuning the filesystem was the tricky 
part.  Block sizes don't generally go as high as 128KB, that would be 
too easy.  Instead, I chose XFS for the filesystem, and used the su and 
sw options (stripe unit and stripe width, generally used for RAID 
volumes) to tune I/O operations for 128KB writes.  You can probably tune 
ext[2-4] filesystems to do the same thing by using the option -E 
stride=N,stripe-width=N when creating the file system (note: I have yet 
to do this on my 4GB CF card, which has an ext3 filesystem).  I'm not 
sure how other operating systems will be able to handle tuning I/O 
operations to stripe sizes for their file systems, but it will generally 
be under tuning for RAID stripe sizes.  The trick is to set the stripe 
width to be equal to the flash cell size, and the "stride" (how many 
disks the file system thinks you have in your array) to 1.

After this tuning was applied, my MythTV system started working 
beautifully ;-)

Hope this helps!  Let me know if it was useful.




More information about the General mailing list