Oracle’s TechCast crew interviewed [.mp3] Chris Mason on Btrfs. A kind-of transcript is available here. Btrfs is a new filesystem for Linux developed by Oracle. It features: “extent based file storage (264 max file size); space efficient packing of small files; space efficient indexed directories; dynamic inode allocation; writable snapshots; subvolumes (separate internal filesystem roots); object level mirroring and striping; checksums on data and metadata (multiple algorithms available); strong integration with device mapper for multiple device support; online filesystem check; very fast offline filesystem check; efficient incremental backup and FS mirroring.”
Because what linux needs is another filesystem!
I find that the goals and feature list of btrfs, unlike Reiser4, look really exciting and *useful*. Chris also seems like a really great team player, unlike Hans.
Traditionally, I’m a big advocate of evolutionary improvement, ext3, and ext4. But btrfs really makes me smile.
Edited 2007-07-27 23:13
Because what linux needs is another filesystem!
Yes, Linux needs a new generation of filesystems and an overhauled volume manager more than anything. Storage is clearly the critical path going forward, from the datacenter to the multimedia client. Storage virtualization is weak compared to processing and memory, manageability is not where it should be, and data integrity is the only design consideration that trumps reliability and performance.
The only thing that troubles me is the trend toward hard integration between the filesystem and the volume manager. I think that Linux needs a storage summit where players like Oracle, EMC, and IBM can hash out a common storage layer that exposes a structured logical storage model to client filesystems. This is necessary for virtualization as well as maintainability and quality.
More comments once I read up on this particular filesystem…
What is wrong with Linux’s volume manager? It was modelled pretty much verbatim after HP-UX’s volume manager (even the same command line flags). Linux takes the best things from every version of Unix and combines them into one. Linux.
What is wrong with it? Daniel Phillips wrote the LVMv1 implementation and agreed it sucked. LVMv2 and Device Mapper actually aren’t that bad.
Linux takes the best things from every version of Unix and combines them into one. Linux.
The best joke I ever heard. Bravo !
He didn’t say that we had gotten around to taking *all* the best stuff yet.
At any rate, the thing that took me aback about the statement was the idea that HP/UX *had* a “best”.
1 word… veritas
What do you think needs to be overhauled in LVM2? My big gripe is not so much LVM2 as the lack of good userland tools that bring the administration of LVM2, Ext3, RAID, and disk partitions together. There is no reason that, with only the existing layers as they are right now, that ZFS should be able to show us up so badly in manageability. Single, simple, commands should be able to accomplish things that now require many invocations of disparate tools.
If I decide to split up an LVM2 lv, I should be able to just say: split this lv into 2 lv’s of these sizes and have everything just happen. I should not have to unmount the filesystem, fsck -f it, resize it with resize2fs, shrink the lv with lvresize, create a new lv with the freed up space, format it, and mount it, all the while hoping that “120G” means exactly the same thing to resize2fs as it does to lvresize. And that is a simple operation that does not involve RAID, physical volumes, and disk partitions, where things get *really* complicated and prone to error.
All the infrastructure is there to do it. But the userland tools always seem just out of reach.
Edited 2007-07-28 03:12
For your gripes, you should check out EVMS.
My central complaint about all traditional UNIX-style logical volume managers is that they abstract a flat block device. The filesystem has no idea how to allocate physically-contiguous blocks. Block N and block N+1 can be on different disks and different controllers. Furthermore, mirroring is limited to the volume level, whereas filesystems and system administrators might want to have finer control of redundancy. Volumes have nominal sizes and don’t efficiently use available capacity.
Modern filesystems are all moving toward extents, and so the volume manager should follow suit. Allocating a physically-contiguous extent should be like using malloc with a pluggable strategy function. Each extent descriptor should have a set of properties for the LVM and a pluggable set of properties for the filesystem. Instead of seeing a bunch of blocks, the filesystem owns a bunch of extents and can allocate more as quotas permit.
Basically, this is a heap for persistent storage. A volume has quotas instead of a size. Volumes allocate from the same pool of physical storage. Filesystems can specify allocation strategies, and they can have their own structures within extents. The storage layer can stripe and/or mirror extents as the filesystem/user sees fit. Extents can have checksums.
This allows the filesystem developer to work at a higher level while maintaining full control of how they organize their structures and data. It also allows more natural management of storage resources and volumes. Finally, it’s more scalable to large disk arrays and distributed storage architectures.
I wonder how complex this filesystem is.
Last thing linux needs a new filesystem that has a code base which dwarfs xfs (which in turn dwarfs all the other filesystems).
“””
I wonder how complex this filesystem is.
Last thing linux needs a new filesystem that…
“””
The feature list looks pretty much like what an ext3/4 successor would need without a lot of other stuff, of dubious value, thrown in.
Potentially, not all that complex. All these features are designed in from the start rather than retrofitted, which helps a lot. Being new, it won’t have a lot of compatibility cruft.
And note that it *does not need* a journaling layer due to its copy on write nature.
http://lwn.net/Articles/238923/
Edited 2007-07-28 00:21
btrfs-0.5$ find . -name *.[ch] | xargs wc -l
….
11116 total
/linux-2.6/fs/xfs$ find . -name *.[ch] | xargs wc -l
106725 total
compared to XFS? it’s pretty damned simple.
“””
compared to XFS? it’s pretty damned simple.
“””
It’s also pretty damned in the early stages of development. Remember when the Mozilla guys were handing out a whole web browser on a single floppy diskette?
the biggest flaw with open development models is this. usualy a lot of people with similar ideas reimplement the wheel in diffrent ways. now that itself isnt all that bad. the bad thing is that without a large company paying developers full time and a vision that streches far out into the future (both from a financial backing and from a suport and development stand point) these projects usualy either fail or dont live up the thier potential.
i personaly would love to see something like this or ZFS become the standrard file system.
i guess we will see what happens…
the biggest flaw with open development models is this. usualy a lot of people with similar ideas reimplement the wheel in diffrent ways. now that itself isnt all that bad. the bad thing is that without a large company paying developers full time and a vision that streches far out into the future (both from a financial backing and from a suport and development stand point) these projects usualy either fail or dont live up the thier potential.
Oracle isn’t a large enough backer? IBM isn’t a large enough backer for JFS? SGI for XFS? (that might be a bad example).
Yeah, reiser4 got stretched out, but politics is as much to blame as finances. sbergman27 says Hans isn’t a team player. He actually was, except for a larger team. He wanted code that DARPA would like and that would port more easily to other OSs. The Linux (V)FS devs wanted some of the more revolutionary features available to their FSs. I can see both sides, and I’m glad a compromise was finally reached, but unfortunately it was too late. reiser4 was actually a very fast, stable, and reliable filesystem before Namesys started making the review changes for mainline inclusion.
Anyway, they aren’t reimplementing the wheel. Each filesystem differs in ways that may seem subtle at first but can make all the difference, especially for niche uses.
i personaly would love to see something like this or ZFS become the standrard file system.
You say that as if Linux needs a standard filesystem. As I said above, Linux filesystems all have pros and cons, depending on usage patterns. Also, competition is good in a development community as large as Linux. They have enough developers that trying different solutions is feasible.
“Oracle isn’t a large enough backer? IBM isn’t a large enough backer for JFS? SGI for XFS? (that might be a bad example).”
i should have been more clear. i am thrilled that oracle (a large company with money) is the backer