[H-GEN] partition advice
Martin Pool
martin.pool at mincom.com
Wed Feb 24 19:49:06 EST 1999
Bruce Campbell wrote:
> > Of course it all depends on what function the machine serves. If it
> > where say a web server (on a single disk), and the html was in /home,
> > I'd put /home in the middle. Obviously this sort of layout would be
> > most beneficial given a SCSI device.
>
> The Netapp Filer uses a filesystem by the name of 'WAFL', which has some
> fairly nifty tricks in it, due to how tightly integrated the controllers
> and the drives are - it can be summed up by it will write what is required
> whereever the heads are, which makes for higher throughput.
This design comes from a really fascinating observation: modern servers
do many more physical write operations than reads. This sounds
wonderfully perverse, but of course arises because many reads can be
served from cache whereas most writes have to go through cache to the
physical disk. A well-equipped server will have plenty of RAM cache,
and in an NFS situations the client caches magnify the effect.
If I remember correctly, the NFS WRITE RPC can't return to the client
until the write's been committed to disk, so the cache has to be (or at
least behave as if it is) write-through.
So, NetApp say that the important thing is fast write performance: it's
OK to suffer the read performance hit of having files scattered all over
the disk as long as you can write quickly.
The situation isn't quite the same on workstations: there probably will
be a similar proportion of writes to reads (see procinfo sometime), but
writes need not be synchronous. In any case, AFAIK the *ix device model
doesn't tell the fs where the heads are.
Allocation is a really hard problem, like caching and garbage
collection. Having to do defragmentation offline is a terrible kludge.
> www.netapp.com
>
> Another Humbug member is writing a filesystem which incorporates some of
> the features of WAFL, that being filesystem-level snapshots.
I wasn't going to announce anything until it's working, but it is I.
(Thanks for not outing me, Bruce.)
There are three more cool features:
1. No more fscks. If the machine is uncleanly shutdown, it simply rolls
back to the last guaranteed-consistent snapshot (typically <5 seconds
old) and instantaneously resumes from there ==> near-instantaneous
reboots. There will be a fsck tool to check the filesystem's working
OK, but it you will never be required to run it.
2. God's own undelete facility: you can take snapshots of the filesystem
at any point, and refer back to them later. Typically this is done by a
cron job, giving five-minute, one-hour, and daily snapshots. About 12
(TBD) snapshots can be stored per filesystem at any time.
If I delete a file, or *change it*, or *overwrite it*, I can retrieve it
from the snapshot. This even works if a whole directory tree has been
deleted and replaced. The semantics are not "if you're lucky", as they
are on ext2 and fat: if the snapshot's not been deleted you can
definitely get it all back. Snapshots are implemented by a
copy-on-write mechanism, so they're taken very quickly and take up space
proportional to the delta.
3. Expandable filesystems: just add more space by
RAID/md/repartitioning, and remount the filesystem to use the extra
space. No need to backup/mkfs.
This is technically not a journalling filesystem, although it offers
features often found in journalling systems.
I have some code, but it's not yet complete. I expect that it will not
be inherently slower than ext2, but naturally it will take some time to
tune it to ext2's level of excellence. People willing to run an
alpha-release filesystem will be heartily welcomed in a couple of
months.
It's worth mentioning that this project is not derived from NetApp or
Mincom in any way. The code will be released under the GPL. The vfs
interfaces differ slightly between *ix operating systems and the inital
code is for 2.1.x / 2.2.x Linux, but I imagine it will be portable to
other systems. I don't expect to offer gratis licenses for non-free
kernels.
--
Martin Pool
"GNU, which stands for Gnu's Not Unix, is the name for the complete
Unix-compatible software system which I am writing so that I can give it
away free to everyone who can use it. [...] In particular, we plan to
have longer file names, file version numbers, a crashproof file
system..."
-- The GNU Manifesto
-
This is list (humbug) general handled by majordomo at humbug.org.au . Postings
are accepted only from subscribed addresses of lists general or general-post.
More information about the General
mailing list