Google is developing a program to help academics around the world exchange huge amounts of data. The firm’s open source team is working on ways to physically transfer huge data sets up to 120 terabytes in size. “We have started collecting these data sets and shipping them out to other scientists who want them,” said Google’s Chris DiBona. Google sends scientists a hard drive system and then copies it before passing it on to other researchers.
Beware the USPS compression format for this data transfer!
As the old saying goes, nothing can beat the bandwidth of a truck full of hard drives Though maybe someday we’ll get something faster someday.
bandwidth indeed. By the latency sucks.
Would something like Sun’s ZFS aide in this data transfer or is it strictly trying to get a protocol to transfer the files from one site to another? In which case you’d need some hardware to do route the packets, like carrier grade or whatever its called.
Unrelated problems.
ZFS, being a filesystem, is unrelated to data transfer. I’m sure it could be useful to the project in otherways, though: ext3 filesystems, for example, have a size limit in the neighborhood of 8-16 TB. You could probably use some kind of logical volume manager to concatinate a bunch of filesystems together, but why do that if ZFS can manage such datasets withought breaking a metaphorical sweat?
Of course, the article says nothing about the actual technology Google is using for these “hard drive systems,” and I can’t recall off the top of my head what the state of ZFS on Linux is (is it working now through FUSE?).
As for the problem at hand–transfers of enormous datasets–it’s really just a Google implementation of the old proverb “never underestimate the bandwidth of a station wagon full of backup tapes speeding down the highway.”
Correction. You’d use LVM to join a bunch of physical volumes together into a logical volume. You’d still have to put a filesystem on the logical volume.
You can’t route 120 TB of data from one university to another. You just have to ship it on hardware.
Mr DiBona, open source program manager at Google, said the team was inspired by work done by Microsoft researcher Jim Gray, who delivered copies of the Terraserver mapping data to people around the world.
sneakernet for the win.
“sneakernet for the win.”
Both in terms of bandwidth and cost.
http://www.codinghorror.com/blog/archives/000783.html
I think the economic value of a network is based on its latency, not its bandwidth. It’s just much more difficult to measure the former.
The Checksum is in the mail!