[H-GEN] Flash drives and wear levelling: user experiences?
Joel Michael
joel at gimps-r-us.com
Thu Jul 31 08:06:32 EDT 2008
Benjamin Fowler wrote:
> My question to the HUMBUG collective is; has anybody else gotten any
> significant hands-on experience with working with cut-down machines
> running off flash drives, and want to tell us their experiences, and
> what they think is the best way to arrange the filesystem, etc to work
> well with flash media?
(Warning: novel inside, but contains detailed explanations of how CF
works, and how to tune file systems for it)
I have recently set up a couple of machines running off CF drives - one
using a SATA-CF adapter that sits nicely in an expansion slot, and one
that has a IDE-CF adapter built in (all-in-one machine based on a VIA
Epia chipset).
I think most CF cards do internal wear levelling, but you'll never know
until it actually fails, and you'll know you've got no wear levelling if
your journal blocks are the ones that fail. I'll tell you in a couple
of years. It is only writes that count towards the wear - you can keep
reading the same data from flash forever with no ill effects.
However, the one thing to be VERY careful about is write speeds. CF
cards (in particular, and all flash drives in general) have real issues
writing small amounts of data due to the way flash media implements a
write operation. In flash-land, a write isn't just a write, it doesn't
just change the polarity of a little piece of disk. Instead, a write is
really a read of the existing data of the affected flash cell, an erase
of the flash cell (a different flash cell if using wear levelling), and
then the write of the entire flash cell. This isn't so bad if you're
just writing the contents of an entire flash cell at once, but if you're
doing a write smaller than an entire flash cell then things get slow.
If you attempt to write 1/2 an entire flash cell, you need to do 2
read-erase-write cycles, and your overall transfer rate will be cut in
half. 1/4 of an entire flash cell, you need to do 4 read-erase-write
cycles, and you'll get 1/4 of the speed. It's a linear relationship.
The problem is that manufacturers don't typically specify how big the
flash cell size is! On my 16GB (16039018496 byte) CF card, I've figured
out that the flash cell size is 128KB (131072 bytes). On my 4GB
(4009549824 byte) CF card, the cell size is 64KB (65536 bytes).
I figured this out by running a loop of dd commands, writing the same
known amount of data (from /dev/zero) using varying block sizes, and
taking note of how long it took to write the data. I wrote the data
directly to the raw device (/dev/sdc) to not incur filesystem overheads,
disabled I/O schedulers for the device (echo noop >
/sys/block/sdc/queue/scheduler) and make sure that dd was using the
O_DIRECT flag to write data (dd oflag=direct).
Every time the block size doubled, the write speed doubled, up until the
cell size was reached. Once the cell size was exceeded, the write speed
stayed constant.
Now that I knew how big the flash cell size was, I had to lay out the
disk appropriately, and tune file systems for it. Laying out the disk
was pretty easy, just start the first partition at 128KB offset (256
sectors, due to 512-byte sectors). Tuning the filesystem was the tricky
part. Block sizes don't generally go as high as 128KB, that would be
too easy. Instead, I chose XFS for the filesystem, and used the su and
sw options (stripe unit and stripe width, generally used for RAID
volumes) to tune I/O operations for 128KB writes. You can probably tune
ext[2-4] filesystems to do the same thing by using the option -E
stride=N,stripe-width=N when creating the file system (note: I have yet
to do this on my 4GB CF card, which has an ext3 filesystem). I'm not
sure how other operating systems will be able to handle tuning I/O
operations to stripe sizes for their file systems, but it will generally
be under tuning for RAID stripe sizes. The trick is to set the stripe
width to be equal to the flash cell size, and the "stride" (how many
disks the file system thinks you have in your array) to 1.
After this tuning was applied, my MythTV system started working
beautifully ;-)
Hope this helps! Let me know if it was useful.
More information about the General
mailing list