With the recent (or not so recent, I am a very slow writer) interest in database file systems, I’ve been thinking about what a typical user really wants from such a system. What would they use it for? What would we need to do to help them get the most from it? Are there any precedents that show how useful a database file system could be? If not, could we invent one? This lead me to some “gedanken solutions” (like gedanken experiments, just with software) that I thought I’d distract you with.
IMPORTANT NOTE: Most of what is discussed here has already being implemented by BeOS since 1996, however the author has never used BeOS and so he was not familiar with its capabilities while writing this article.
As technical people, we can all think of a bunch
of cunning uses for a database filesystem. My personal dream use
would be a superlative code management system; when integrated with a
good editor/IDE, it could provide revision control, tagging,
searchable documentation, name completion, and probably any number of
other things. Imagine being able to search the Doxygen comments for
the function you can vaguely remember provides exactly the feature you
want. Imagine being able to find every place a method is called so
you can tweak its interface. Imagine being able to examine, shuffle
and package changesets, like Bitkeeper.
But that is quite a lot to implement all in one go, even as an
imaginary system, and it doesn’t really
show how a general user would be able to take advantage of the tools
we want to provide.
Instead, I’d like to focus on the humble email client. Email clients
have a number of features that make them interesting here:
- Everyone uses a mail client
- Email messages have a bunch of attributes that can be easily extracted
- Mail clients use a custom database
The latter item is particularly intriguing. Email clients have a
custom database, but what do they do with it? They use it to
implement what at first sight appears to be a straightforward
filesystem with folders and files just like the OS-native version.
There are some deviations from this norm like the virtual folders in
Thunderbird and Evolution, and I believe Opera uses a more generic
database in their mail client, but predominately we are still using a
hierarchical structure to organize our email.
This observation inspires the following questions:
- If database filesystems are so good, are there any good reasons why no-one has implemented one for email?
- Can we explore the usefulness of a database filesystem by implementing one within a mail client?
- What “killer features” would such a mail client provide, and would they convince users to switch?
The rest of this article tries to find some answers to these
questions by creating a specification for a database-backed mail
client.
The Mail Database
First of all, we should explore what features a database backed mail
client would provide the user. In a pure email system, we would only
need to store two different types of objects: email and addressbook
entries. To simplify things, I’m ignoring all the other things, like
task items and diary dates, that some mail clients store.
We can divide the attributes for each object into three different categories:
- Intrinsic attributes – These are defined by the objects themselves, e.g.,
- The sender, date, recipients, subject etc. for an email.
- The name, email address etc. for an addressbook entry.
- Client attributes – These are invented by the mail client to manage the database objects, e.g.,
- Object type
- Unique identifier
- Per-message flags: draft, sent, unread, deleted etc.
- Received date
- User attributes – These are attributes that the user maintains e.g.,
- Per object flags e.g., message has been replied-to, message has been forwarded, message needs response
- Object category attributes, e.g., message is a personal/work message, addressbook entry is a friend/business associate
- Custom attributes e.g., Deal-with-by date
The above is obviously not an exhaustive list of attributes, but I
think they give a feel for the type of things we are talking about.
We want to use the message attributes to help a user organize their
email in ways that weren’t possible with the old folder paradigm.
For example, the user might want to
- Set a “Needs reply” flag so that the user can see which messages need to be responded to.
- Set a “Deal with by” date so that the user can specify any deadlines imposed by the message and a completed flag the user can set when the task is complete.
- Set flags indicating that the message is work/personal/etc.
or any other attributes that the user might think of. The important
thing is the user should be able to modify the set of attributes
whenever he wants; it might be difficult to get a user to maintain a
set of attributes that we impose on him, but he is bound to be keen to
use attributes that he defines himself.
The user can can use these new attributes to manage his email in lots of new and interesting ways, for example,
- The user can find all messages that have been waiting for a reply for longer than a week
- The user can find all messages with imminent deadlines
- The user can find all work messages from a particular recipient
Creating a Message Hierarchy
One attribute type that I haven’t mentioned is a explicit message
folder. Instead we can produce a folderlike hierarchy using any set
of attributes. But will the user want to sort his email into a
hierarchy? Considering the precedents – current mail clients,
hierarchical databases and filesystems, DNS, taxonomy and any number
of other examples – I think we can safely assume that the need to
categorize objects into a hierarchy is hardwired into the human brain.
I can think of two approaches to producing a hierarchy from object
attributes. First of all, we can categorize objects using a subset of the
available attributes. At each level of the hierarchy, we choose an
attribute, and assign messages into subcategories using that
attribute.
This hierarchy is very simple to achieve but its usefulness is
probably limited. Most attributes aren’t suitable. Who would want to
categorize their messages using the message ID? How would we use a
multi-valued attribute such as recipients? Even the originator will
only be useful under limited circumstances.
The second option is to use a specific user-defined category
attribute. The user enumerates all possible values of this attribute
and assigns messages to their appropriate categories as he sees fit.
To produce a hierarchy, we divide the category attribute into fields,
with each field used to categorize objects at a given level in the
hierarchy.
The most useful solution would probably be a combination of these two.
At the highest level, the user would want to see their messages
categorized using the message flags to produce categories like unread
and uncategorized messages, messages waiting to be sent, deleted
messages etc. Afterwards, it is probably sufficient to arrange
messages according to the single category attribute.
Note that with this scheme, we no longer guarantee that the message
categorization is disjoint – a given message can exist in more
than one category. In fact it might be useful to make the category
attribute multivalued. After all, not every message is easy to
pigeonhole.
Addressbook Entries
As mail messages and addressbook entries are stored in the same
database, the same tools are available for both types of objects
– addressbook entries can be flagged and categorized just like
email. Personally, I would identify my contacts as friends,
colleagues, managers, people I play soccer with, etc. These
categories are definitely not disjoint; surprisingly enough, your
colleagues and fellow footballers can be friends too.
Once we have assigned attributes to addressbook entries, we can use
them in lots of interesting ways. One very powerful use would be to
generate lists of recipients, for example to
- Send a message to all people marked as soccer players to organize a match.
- All people who are both soccer players and colleagues to organize a game against another company.
More interesting possibilities arise when we allow database “joins”
between addressbook entries and email messages. This would allow us
to do things like
- Find messages from your managers that need a response.
- Find a list of friends that we haven’t emailed recently.
Issues and Omissions
There are some unfortunate consequences to storing messages in the
way described above. For example, as objects are not uniquely
categorized, a given object can exist in multiple places in our
hierarchy. If a user is used to the old folder paradigm, he will be
surprised when “copies” of an object are updated when he updates one
instance. This is not much of a problem for immutable objects like
email messages, but may be a problem in other cases.
There is a related problem that appears when we want to delete an
object. Does the user want to delete the object from the database, or
does he just want to remove it from the current categorization? In
the old paradigm, these two were synonymous. Perhaps we want to
provide both mechanisms to the user, but he may find this confusing.
Taking these thoughts further, does this API mean that the user is
less likely to delete messages? Will we get enormous mail databases
with items squirreled away in obscure categories? Do we need to give
the user tools to help him clean up the database?
Finally, there is one major piece missing from the above. What do we
do if the user wants to search for a word or phrase in the subject or
body of one of the messages? This is arguably the most useful
feature, yet it is not obviously provided by a database based message
store; it can just as easily be provided by a traditional email
datastore. So is the whole complex edifice described above useful
enough to warrant working on an implementation? If we can’t convince
the user to maintain and use the user attributes, the answer is
probably no.
Mail Client UI
In my opinion, defining the database and inventing some examples of
how it could be used is the easy bit. The UI is another matter
entirely. After all, we can have the most advanced database in the
world, but it will not be used unless the interface to it is simple
and intuitive. User interface design is not my strong point, so I won’t belabor this
issue, but I’d like to briefly suggest some approaches that I think
might be fruitful.
First of all, we should define the problem: There are three aspects of the UI to a database message store:
- Schema modifications – introducing new attributes and attribute values.
- Attribute maintenance – viewing/modifying an object’s attributes.
- Specifying queries – using the attributes to find messages in the database.
Schema modifications are relatively straightforward. For example,
when adding an attribute to the schema, the user just has to specify
an attribute name and type, and perhaps a set of allowed attribute
values for enumerated types. This is ideal material for a simple
wizard.
Attribute maintenance is slightly more tricky. Most attributes are
intrinsic to the email messages or maintained by the mail client itself,
so will never need to be updated by the user. Most of the rest can be
modified using straightforward UI widgets – message flag
attributes can be set via buttons or tickboxes, enumerated attributes
via selection boxes etc.
The message category is another matter. A typical user may want to
use many different categories, forming a deeply nested category tree.
In addition, he may want to assign two or three categories to a
message. This means that the usual solution – a drop-down
selection box – is probably not useful.
One option would be to allow the user to drag a message from a
“message list” panel to a “folder list” panel in analogy to the way he
would move a message into a folder in current mail clients. This is a
good solution, but does not make it obvious that multiple categories
can be assigned to the message. Also, many of the “folders” would be
like the virtual folders in current mail systems, so could not be used
as a target. So this interface may not be intuitive.
Another option would be a tree view of the available categories with
tick boxes adjacent to each category. Boxes would be ticked for each
category that the message was assigned to. This is the most complete
representation of the category information, but would be rather
tedious to use.
Another option would be a text entry box with autocompletion, like the
addressbar in a web browser. This would be good for typists, but
perhaps not so good for everyone else. I’m sure that there are many
other solutions too. This is one area where some experimentation may
be required to discover which one is best.
Finally, we come to the UI used to specify queries into the
database. Essentially, we want to create a UI component to replace
the “folder view” found in current email systems. This would be the
primary tool for finding messages in the database, so is the most
important component of the new mail client UI.
We can conceptually divide message location techniques into the following:
- Find messages based on a few fundamental message properties, e.g.,
- Recently arrived and uncategorized messages, i.e., an Inbox folder.
- Messages queued up to be sent, i.e., an Outbox folder.
- Messages currently being prepared i.e., a Drafts folder.
- Messages marked as sent i.e., a Sent messages folder.
- Find messages based on their category. As discussed above, this is the equivalent of the traditional folder hierarchy, and can be displayed as such.
- Find messages based on message thread or other intrinsic or client attributes.
In addition we would want to include more DB-like interfaces for finding messages, e.g.,
- Find messages using a query language.
- Virtual folders that remember useful queries.
A fully featured query interface may be too difficult for a typical
user, but current DB query tools can be used as a guide.
One simple query interface hinted at above is the query tree; here
we build a hierarchy using simple attributes.
At each level of the hierarchy, the user selects which attribute to
use by, for example, clicking on an icon where the "[+]" and "[-]" icons
usually are in current folder browsers. Each time the icon is
clicked, the tree is expanded using a different attribute and the icon
is changed accordingly. Each entry in the expanded subtree
corresponds to one value of the chosen attribute.
Not all attributes are suitable for such a query tree, particularly
attributes with many distinct values, e.g., dates and the message
subject. Also the user would only be able to do very simple queries
in this way. But this might be a simple enough interface that it
might be used by everyday people. Again, more research is required.
Database Implications
You have probably gathered that the database described in this email
is not a typical relational database. I like to think of it as an "ad
hoc" database; we can use it to store any old junk.
In our favour we don’t have too much information to store. Even
the most sociable of users will not have millions of emails. In
addition, the attributes we use are pretty straightforward: flags,
enumerated values, strings, a category attribute with fields, perhaps
dates and numbers.
Having said that, this database may be difficult to implement
efficiently. The user can add new attributes to any object at will.
Most attributes are optional and some are multivalued. Together this
will make it difficult to find an efficient storage scheme, though
something like Reiser4 whould suffice. Further, we will want to key
every attribute, as well as creating a “string search” key into the
message subject and body.
These things may make the database difficult to implement. These
difficulties will only get worse as we add more object types to our
database, and may be prohibitive in an unconstrained system like a
full database file system. Perhaps this is why Microsoft has failed
to implement one for the last 10 years.
Taking it Further
Even though this system is imaginary, it is useful thinking how it
will develop over time. I can think of a few directions that could be
explored:
Accessing databases remotely
Just as the MAPI and IMAP protocols provide remote access to traditional
mail databases, we would need to create a network protocol to provide
remote access to our database.
If we were using a DBFS, this could be the filesystem’s standard
network protocol, if one existed. Conversely, creating a network
protocol to access our database would serve as a good prototype for a
network DBFS.
Linking together multiple databases
Quite often, people have access to more than one mail database.
For example, a company employee typically has access to two: their own
database containing personal messages and a shared public database
containing messages of general interest.
It is interesting to consider how we could integrate multiple
databases in this framework. Will the user see a combined view or
will they see two distinct databases? How will access control be
implemented? Can the user add private attributes to public entries?
Setting attributes automatically
Some user attributes, e.g., attributes indicating that the message
has been forwarded or replied-to, can be automatically set by the mail
client. It would be useful to provide some javascript-like scripting
language to allow the user to automate the maintenance of such
attributes.
Taking this one step further, we could use the same techniques used
to identify spam &ndash Bayesian filtering for
example – to place messages into other categories. Although I’m
not sure if users would be willing to allow their messages to be
categorised by a machine.
Adding attributes to outbound messages
A user might want to include attributes within messages sent to
other users of database-backed mail clients. For example, the sender
might like to set a reply-required flag or a reply-by date. The
recipient might like it if the message indicated whether it was a work
or a personal email.
The RFC822 standard is flexible enough that it would be quite easy
to add these attributes to a message. The difficult bit would be
creating a shared schema for all users to use.
Where Do DB Filesystems Fit?
You may have noticed that database filesystems weren’t mentioned
much in the above. So now it is time to ask what would such a
filesystem give us?
It is not obvious that a database filesystem would implement what
we require. For example, would we be able to add attributes at will
to database objects? Would the database filesystem allow us to do
string searches into the body of objects? Perhaps the database
filesystem would just provide an efficient storage layer (e.g.,
Rieser4) and we would have to do all the indexing ourselves.
Assuming that we have access to a database filesystem that fulfills
our requirements, implementing a mail client on top might be little
more than implementing a UI. Unfortunately, as I attempted to convey
above, I think that this would be one of the harder problems to
solve.
In addition to easing the implementation of our mail client, the
database filesystem would allow users to manipulate messages using the
standard filesystem tools. For example, users could view and edit
message attachments with standard utilities – the attachments
would appear as if they were just another file in an ordinary file
hierarchy. Of course, we don’t need a database filesystem for this;
we could achieve the same result by exporting the contents of our
database as a userspace filesystem using tools like FUSE.
Perhaps most importantly of all, a database filesystem would allow
us to unify the way we handle all filesystem objects. For example, if
we extended the database to extract intrinsic attributes from word
documents or music files, then these attributes would automatically be
available in our mail client.
So it seems that a database filesystem does not buy us very much.
On the other hand, many of the ideas and issues outlined in the
previous sections apply to a database filesystem just as much as to
our database backed mail client. In the former, we still need tools
for administering attributes and specifying queries. Solving these
issues in the simple email case should give us a good insight into
more general solutions for a full database filesystem.
Conclusion
The above was a fairly undirected ramble through some ideas, and I
must apologize for inflicting it upon you. I think the point I am
trying to illustrate is that a database filesystem is not a solution
in itself. Like all databases, it is only as useful as the
applications that use it. The converse is not true – we can start
implementing the applications straight away using a custom
database.
I think this suggests that it is worthwhile starting to implement
the applications now. We can start using the database filesystems
when they become available.
If I get some time, I plan on experimenting with some of the above
ideas in a prototype mail client. But considering it took me a month
to write this essay, don’t hold your breath.
Further Reading
DB Mail Clients:
- Opera’s mail client is
based on a database and includes many of the features described here. - The new Mac mail client
is also database-like. Its search UI is based on
Spotlight,
so integrates with the Mac filesystem.
DB filesystems
- Reiser4 is an efficient
database for storing lots of small objects. - Hans Reiser’s vision for a database filesystem
- Real soon now, Microsoft will unleash WinFS
onto the world and make all other database filesystems obselete.
Though details are still a little vague.
If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSAlert.
If their filesystem is based on a database, I presume people would use it implicitly. The point is – would *developers* use it to create cooler apps? Probably. Why should finding files quickly (or content for that matter) be left to the indivdual applications or application layer? I think database technology has advacned far enough for these systems to be integrated in some way.
BeOS has tried two different “database” filesystem approaches. The BFS, with its extended attributes, indexes and query functionality, and the early BeOS storage database.
http://www.theregister.co.uk/2002/03/29/windows_on_a_database_slice…
Be’s BFS was years ahead of the market and such a pleasure to use, I think the actual dynamic of a file system won’t change so much in concept but in it’s implementation and usability. useful for Vfolders and the like though
If I understand databases well, what makes them great tools is their set-theoretical approach to data. Data get stuffed into nice, two-dimensional packages that can be indexed and managed with relative efficiency.
Reality often is more of a graph of arbitrary complexity, with gnarly cycles.
I submit that the challenge of managing the things you’d like to do with a filesystem may be similar to the challenge that a cartographer has when figuring out how to map a three-dimensional object to a two-dimensional space: there will be some ‘fibs’ told about scale, or the relationship of things within the map to each other to make it work.
If the task was not hard, it would have long since been done.
Oh.. do I miss BEOS for the file system. That was the one thing I loved the most of BEOS. I’m not sure how Mac OSX works, but in the lastest few demo clips for the next version (forgot the name) it seems that Mac Filesystem might have some sort of DB too?!
Man I only wish there was someone out there that was able to make BFS or something similar in Windows. Ya Ya I know.. why would you want that. Sorry, but I still use windows more than any other OS. For many reasons, work and gaming being on the top of the list. Addon at good price, I am very very very afraid of WinFS. Very afraid.
> I’m not sure how Mac OSX works
Mac OS X Tiger is BeFS-plus. It supports everything BeOS did back then, plus the ability to search *inside* 20 file formats.
>I am very very very afraid of WinFS.
Why? It was cut by Longhorn anyway in order to make the deadlines.
Opera M2 mail does almost all of the things that he talks about, and he links to at the end of the article. Anyone interested in data storage and retrieval should try it out… it is very tough to go back to an ordinary email program. Of course some of the others are getting closer.
On a related note, the Yahoo Desktop Search (a free version of X1), along with other desktop searches to varying degrees, can search all your files almost instantly, and integrates into Outlook. You can save searches, with many custom parameters, for immediate retrieval. Of course it builds an index separately from the file, but I think that may be safer, because problems with the index would never damage the file itself. I think the ability to search the contents of a file basically negates the lack of file attributes in windows.
Doesn’t Reiser 4 add a lot of this stuff?
One is example is Pick. It is an os and a fs and a db all in one. It is still in use in healthcare systems, and others. If I recall correctly, IBM also made a fs and db in one.
Matt
The IBM AS400 operating system has a file system that was a database circa 1980.
Why must each attribute have a name and a value? Forcing this structure is theoretically ugly, since you can not assign categories to the attributes. But more importantly, it’s extra, tangential, work on the user’s end. When creating a grouping, I don’t want to think of a classification for this grouping — I just want to create the group!
So why do we need attributes to have names? Why not just allow me to create the group “friends” the group “colleagues” and the group “football players”?
I would hate a database file system.
I can never remember what my files are called or what they were supposed to be. But I can always remember which directory I put them in.
AS/400 (alias iSeries, alias i5) has a built in database system. If you can live with 10 character table names and 10 character database names (we call them libraries) you get a fairly robust RDBMS. This system existed for years without a name, and is used by all of the other operating system components that need persistent data.
IBM got tired of Oracle kicking sand in their face (anyone remember Charles Atlas ads?) finally named it DB2/400. This database is actually built on top of a single level store, a style of virtual memory where requested 4K byte pages from the disk are brought into memory and can be shared by all authorized users, and updated pages get copied back to the disk. This provides many advantages of an in memory database, but IBM doesn’t push that in marketing.
The database is ANSI compliant and has most of the modern features – triggers, referential integrity constraints, stored procedures, BLOBS, CLOBS, SQL interface as well as VSAM like record at a time navigation for traditional RPG/COBOL pgms. Most shops do not have DBA staff – the system is dirt simple to administer.
In addition, the AS/400 has a large set of alternative file systems, called IFS (Integrated File System) which treats the RDBMS as another file system. The “root” system is UNIX/Windows like, ie, hierarchical, long names, and allows UNICODE encodings, whereas the database is classic mainframe EBCDIC. You can link a stream filename to a field in a database record, which locks it down. Useful for linking a resume to a personnel record, etc.
Rufus I sent you a gmail invite. Check out their filters, search, views, etc. Nothing really gets deleted in gmail unless you try very hard. Gmail is very similar to what you are proposing – the folder’s aren’t real they are simply views on the database.
Also check out Reiser4. http://www.namesys.com/
There are good white papers there. Hans is building a system almost exactly like you describe. One company has implemented a complete XML database on Reiser using plugins.
The last line of the article reads:
“Real soon now, Microsoft will unleash WinFS onto the world and make all other database filesystems obselete. Though details are still a little vague.”
What is the author considering to be “real soon?” Longhorn? Ha!
Umm…hate to break it to you, but Longhorn/Longwait will NOT have WinFs.
Here is just one article among many that talk about how WinFS has been cut from Longhorn.
http://www.microsoft-watch.com/article2/0,1995,1640454,00.asp
Even Microsoft has admitted having difficulties with WinFS and has no idea when it will be ready.
http://news.com.com/New+file+system+has+long+road+to+Windows/2100-1…
The link above points out, and includes comments from a Microsoft exec, how WinFS may not be ready until Blackcomb which wont be for another decade, even though WinFS has been in the works for a decade already. The earliest WinFS will be available will be as a test version in late 2006. I’d hardly call that “real soon.”
Hi all,
OS/2 had Extended Attributes in his HPFS file system long before BeOS, and still has.
OS/2 implements EAs natively on HPFS and JFS, while it supports them on FAT with a non-native approach.
What is nice in BeOS is the “query” concept that resembles DBs much more, even though it has always been possible to do EA-based searches in OS/2.
On the “performance” side, BeOS added indexes, which made queries perform a lot faster.
Bye
Cris
The author says he never tried BeOS… maybe it would be nice to do it now so next time he has some background
> “… BeOS was …”
BeOS *IS*: http://yellowtab.com – http://haiku-os.org
> WinFS
ugh ? AFAIK that has been withdrawn from Longhorn btw :p
> reiser4
The only problem is the upper layer.
The Linux VFS doesn’t have any call to use the extra features.
It barely has attribute read/write calls.
Yes, Linux now (finally !) supports POSIX eattrs… the only problem of those being they are *untyped*.
just a name-value pair.
How do you know it’s a string, an int or whatever ???
Of course the application that created it knows, but the other ones will stay ignorant.
In BeOS, attributes have a 32 bit type field associated, which tels if they are int, int64, string, mimetype, … or some app-specific stuff.
That allows Tracker (the file mgr) to display them (ok, the mime db describes those attributes), but also any app to at least read them someway. (and queries to search for them correctly)
Microsoft Outlook, part of their office suite, uses a Jet database to store it’s email messages. You can use it to search for attributes, and you can even store normal files in the folder hierarchy.
Their mailserver, Exchange, builds on the same database design. The wole WinFS idea is derived from the Exchange Storage.
Personally I prefer Maildir over the Exchange Storage because Maildir is less vulnurable to disk errors and is easier to rsync / backup, and Maildir stored messages are easier to manage / edit / troubleshoot with standard (filemanager) tools.
But the DB capabilities of the Exchange Server are often useful, that’s for sure.
Mac OS X Tiger is BeFS-plus. It supports everything BeOS did back then, plus the ability to search *inside* 20 file formats.
Only at first glance, not if you look more closely:
BFS keeps all the indices in the filesystem itself, and the implementation is part of the file system. On Tiger, it’s just an additional process that’s running. In effect, your indices on the drive are not necessarily in sync with the actual data, as you may have changed data with no add-on running (e.g. when having the same drive mounted in 10.3). There are a few more differences, but the ADC NDA tells me that I have to keep them for myself. Furthermore, all attributes are read-only from the Finder, you can’t simply change an mp3’s rating in the file manager like you can in BeOS.
Oracle’s Internet File System has been around for 5 years or so now (under a number of names). Although designed more for use on file servers that individual PC’s.
http://www.oracle.com/technology/documentation/ifs_arch.html
As it uses an Oracle RDBMs you can use much of Oracles technology, its part of Internet Application Server these days.
A .pst file is really a Jet database?
Does that mean it’s really an .mdb, under the hood?
This bears experimentation, because no glance at the COM interface, or experience with the software Winnebago that is Outleak ever hinted such.
I’d just put my files on a webhost, and let Google index them for me, then I’d just use googlefs (no it’s *not* gmailfs)
http://clapcrest.free.fr/revol/beos/shot_googlefs_006.png
>HFS+
So it seems BFS is still better :p
> Oracle’s Internet File System
I’ve read a bit about AFS and Code, but never saw that thing, will have a look.
you have email atributes in a filesystem.
nice. You can store From, To, Subject, multipart messages, attachments… the whole shebang.
Now, i write an app that mark messages as spam, or something that you didn’t tought about when designed the attributes. No matter how intuituve the design is or how nice it treat extra data or how nicely tought was the fall-back compatibility… I will not read much of the docs and will save it into the subject field, breaking all other clients.
That didn’t happened to BeFs with it’s email attributes because there weren’t much clients.
Also, it’s the same that happened with Html and is happening with XML.
If we had consistent data. the naive solution of apple to simply read inside the common formats, would sufice for that purpose.
read subject.
Any chance I could see some screenshots?
exchange uses a jet db for sure
i’m less sure about outlook,
http://groups.google.nl/groups?q=outlook+pst+file+jet+database&hl=n…
states it, but
Van:Sue Mosher [MVP] ([email protected])
Onderwerp:Re: PST file format
Discussies:microsoft.public.outlook.program vba
Datum:2002-04-15 10:18:30 PST
It’s undocumented, and it’s not JET. Since only Outlook users can open
PST files and use the data, why not use Outlook to make your
modifications?
—
Sue Mosher, Outlook MVP
Outlook and Exchange Solutions
at http://www.slipstick.com
anyway, .mdb != .pst
So it seems BFS is still better :p
It’s different. Much better attributes (since OS X is not doing it on file-system level), but no full-text search.
Indeed. Not only does it return better results than any other e-mail client, it’s done faster than other clients (e.g., Outlook) return their first result. And while it took me some time to break from the folder method, labels work really well. It’s a pity that Opera (as of 8.0b) still restricts the maximum number of labels to such a small value.
The idea of a DB filesystem is fairly old. Until recently, it was largely impractical because of the resource requirements (CPU and disk). There are various experimental and even production-level implementations of such things for virtually every platform, some a dedicated filesystem, others simply glue that maps a virtual filesystem to an object store / RDMBS. Microsoft’s original intent was to remove the whole hierarchical filesystem concept and have all data stored in an object store instead.
The only problem these things have are the explosion of attributes and indices. There are lost of ways to implement the filesystem database concept. They all have the problem that if the metadata storage permits arbitrary attributes (application defined), you soon have so many that some attributes become redundant (CorelDraw uses ‘CreationDate’ and Adobe uses ‘DateCreated’, for example) or ambigous. Indexing of the various attributes and the overhead of mainting the indices and syncing metadata with object data becomes prohibitive and difficult to schedule properly (particularly in a multiprocessing environment).
More to the point, once there’s sufficient structure and normalization of the metadata, why maintian the notion of a file at all? Simply move all data into the attributes (perhaps have a special attribute that’s a graph of relations between other attrbutes). That eliminates the concept of the file and completely changes the way everything is stored. It’s conceptually so abstract that Joe Average will never understand it, and it’s fragile (disk errors, etc), and would open up entirely new horizons for potential abuse by the implementor as far as platform-data lock-in and hidden proprietary features. What could be better from a commercial OS provider’s point of view?
RE: Francois
Long term file systems may be replaced by semantic nets that are stored in databases. For now, relational databases don’t manage large objects (BLOBs and CLOBs) as well as the file systems. Therefore the transitional period is required where metadata and other attributes are stored in databases, while file content is stored in file system. In fact, it is conceivable that databases will eventually import some file system characteristics/mechanisms for large data item management.
Where Do DB Filesystems Fit?
If messages were stored in plain files (either ASCII, or some other doc format, DOC, PDF,…) while the attributes are stored with files in the DB FS (from, to, …) then it would enable developers to create many different tools for e-mail. We would no longer need a single massive monolithic app (Outlook) that requires to have all the command anyone could possibly use.
Instead, we could have a simple reader/writer app, another app to filter spam, another app to search, etc. All of them would use the standard FS e-mail attribute names, hence would work together without any special effort. That way we would no longer be locked in a specific client, but could easily change it, customize by adding/removing other apps, etc.
The actual managament of attributes (from, to,…) should be left to the DB FS because it is common to all apps. E-mail apps use from, to, subject,… doc apps use author, title, …. and so on. Each app has to create its own format, search functions, etc. Instead of replicating this effort and reinventing the wheel the common command (read, write, search) ought to be available to all apps from DB FS.
See Dekk at http://www.dekksoft.com/index.html
What about:
http://www.sqldesktop.com/ ?
I am afraid database system is what it is all about. Keeping “file” abstraction in a database based OS is somewhat weird.
Yeah, I just confirmed that there was no easy way into the .pst.
Sure is any easy way out, though: Gnus.
Sorry, but you lost me. I filter my Email and UseNet using BeOS’s queries all the time. First, the way I avoided your problem is everthing that matchs the filter just goes to trash and is delete. True I have to use loose filtering so as not to lose anything I want but that first pass deletes 90% of the junk, and I don’t even need to run a program, the query window is always open on screen 2.
Second, what problem marking spam? Just create an additional SPAM flag attribute for Email. Programs that don’t know about will not touch it, so it is set only if you want it to be. You are not stuck with only the attributes that come with the system, you can create and add as many new ones as you want with not problems or conflicts. BeOS is that great!
http://www.bebits.com/app/454 has always worked for me.
http://www.bebits.com/app/3637 looks interesting, but not finished.
http://www.bebits.com/app/3782 I have never used but Linux people probably would like better.
Don’t forget that even the classic MacOS HFS filesystem had files which each had a resource fork and a data fork, allowing the storage of all kinds of interesting metadata (including icons, creating application tag, and other info) along with each file. I don’t know if the initial incarnations of the MacOS had it, but I know it certainly predated OS/2’s usage of EA’s.
Earl Colby Pottinger
BeIndexed is going in that direction, but the others are all slow brute-force searhces.
Rich Steiner
It’s a common misconception that type/creator info is stored in HFS’ resource fork – it isn’t. It’s stored in a regular file attribute, just like a file name or date. The resource fork was used for all the things that OS X wants you to put in separate files now, application resources like images, strings or sounds.
Lotus Notes has been doing *all* this … and more …. for years.
While we are all waiting for DB FS, here is the
DB mail application the author is yearning for.
http://www.dbmail.org/
I liked the style of the article. As far as details go their are others more qualified than I, but some observations I have made are as follows:
1) Any file system can arguably be called a database, depending on how strictly you define the term “databse”.
2) Entropy rules.
3) One of the most popular computer applications is e-mail.
4) E-mail has become the de-facto file system for many clueless end-users, despite our pain.
5) Sym links are fun for a while.
6) Context people, context.
7) Task oriented file systems, smoke ’em if you got ’em.
8) 0100010001101111011101110110111000100000011001000110010101100101011100 0000101100
0010000001110100011010000110100101110011001000000110100101110011001000 0001110111
0110100001100001011101000010000001110100011001010111100001110100001000 0001110010
0110010101100001011011000110110001111001001000000110110001101111011011 1101101011
01110011001000000110110001101001011010110110010100101110
9) What did he just say?
10) “There’s a message for you.”
I remembered seeing icons, various strings, and other things living in the resource fork when viewing them via ResEdit on a classic Mac, but I was admittedly guessing on the type/creator string. I sit corrected!
VMS has it for years. Do we need another whell? Come on.