TIP #17 Version 1.1: Redo Tcl's filesystem

This is not necessarily the current version of this TIP.


TIP:17
Title:Redo Tcl's filesystem
Version:$Revision: 1.1 $
Author:Vince Darley <vince at santafe dot edu>
State:Draft
Type:Project
Tcl-Version:8.4.0
Vote:Pending
Created:Friday, 17 November 2000

Abstract

Many of the most exciting recent developments in Tcl have involved putting virtual file systems in a file (e.g. Prowrap, Freewrap, Wrap, TclKit) but these have been largely ad hoc hacks of various internal APIs. This TIP seeks to replace this with a common underlying API that will, in addition, make porting of Tcl to new platforms a simpler task as well.

Overview

There are two current drawbacks to Tcl's filesystem implementation:

Prowrap (http://sourceforge.net/projects/tclpro), Freewrap (http://home.nycap.rr.com/dlabelle/freewrap/freewrap.html), Wrap (http://members1.chello.nl/~j.nijtmans/wrap.html), TclKit (http://www.equi4.com/jcw/wiki.cgi/19.html), ... are all attempts to provide an ability to place Tcl scripts and other data inside a single file (or just a small number of files). The best and simplest way to achieve that task (and many other useful tasks) is to let Tcl handle the contents of a single 'wrapped document' as if it were a filesystem: the contents may be opened, sourced, stat'd, copied, globbed, etc.

This TIP suggests that Tcl's core be modified to allow non-native filesystems to be plugged in to the core, and hence allow perfect virtual filesystems to exist. The implementations provided by all of the above tools are very far from perfect. The most obvious types of virtual filesystem which should be supported are:

but the main point is that all filesystem access should occur through a hookable interface, so that Tcl neither knows nor cares what type of filesystem it is dealing with.

Furthermore this hookable interface should be Tcl_Obj based, providing a new 'Path' object type, which should be designed with two goals in mind:

If all of these goals are achieved, Tcl will have a new filesystem which is both more efficient and more powerful than the existing implementation.

Technical discussion

1. Virtual filesystems

An examination of the core shows that a very limited support was added to tclIOUtil.c in June 1998 (presumably by Scriptics to support prowrap) so that TclStat, TclAccess and Tcl_OpenFileChannel commands could be intercepted. (See http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/tcl/generic/tclIOUtil.c?rev=1.2&content-type=text/x-cvsweb-markup&cvsroot=tcl)

This TIP seeks to provide a complete implementation of virtual file system support, rather than these piecemeal functions.

Fortunately, since Tcl is already abstracted across three different filesystem types (through the Tclp...) functions, it is not that big a task to abstract away to any generic filesystem.

One goal of this TIP is to allow an extension to be written so that one can implement a virtual filesystem entirely in Tcl: i.e. to provide sufficient hooks into Tcl's core so that an extension can capture all filesystem requests and divert them if desired. The goal is not to provide Tcl-level hooks in Tcl's core. Such hooks will only be at the C level, and an extension would be required to expose them to the Tcl level.

2. Objectified filesystem interface.

Every filesystem access in Tcl's core usually involves several calls to 'access', 'stat', etc.

For example 'file atime $path' requires two calls to 'stat' and one call to 'utime', all with the same $path argument. Each of these requires a conversion from the same Utf path to the same native string representation. No caching is performed, so each of these goes through Tcl_UtfToExternal. Often Tcl code will use the same $path objects for an entire sequence of Tcl 'file' operations. Clealy a representation which cached the native path would speed up all of these operations (except the first).

The second reason why objectification is desirable is that in a pluggable-fs environment we must determine, for each file operation, which filesystem to use (whether native, a mounted .zip file, a remote ftp site, etc.). If this information can be cached for a particular path, again we will not need to recalculate it at every step. A similar technique of to that used by Tcl's bytecode compilation will be used: each cached object will have a 'filesystemEpoch' counter, so that we can tell with each access whether the filesystem has been modified (and we must discard the cached information). Mounting/unmounting filesystems will obviously modify the filesystemEpoch.

A partial implementation of this TIP, and a sample 'vfs' extension now exist, and have been tested through TclKit. On the 'virtual' half of this tip, the implementation is known to be reasonably stable and complete: TclKit can operate through this new vfs implementation without the need to override a single Tcl core command at the script level. Commands which operate on files (image, source, etc.) and extensions like Image, Winico can be made to work in a TclKit automatically! The part of this TIP which has not yet been implemented is the objectification of the filesystem. This will have additional efficiency gains for vfs's implemented at the script level, since the same Path objects can be passed through the entire process, without an intermediate conversion (and string duplication which is currently required). The combination of caching and objectification will change the existing list of steps from:

Tcl_Obj -> string -> filesystem -> convert-to-native -> native-call

or (with vfs hooked in):

Tcl_Obj -> string -> vfilesystem -> pick-filesystem -> convert-to-native -> native-call

and

Tcl_Obj -> string -> vfilesystem -> pick-filesystem -> Tcl_NewStringObj -> Tcl-vfs-call

to:

Tcl_Obj -> vfilesystem -> native-call

and

Tcl_Obj -> vfilesystem -> Tcl-vfs-call

A final side-benefit of this proposal would be that it further modularises the core of Tcl, so that one could, in principle:

However these final two points are explicitly not the goal of this TIP! I simply want to improve Tcl to add vfs support, and the best way to do that seems (to me) to be along the lines of this TIP.

Proposal

The changes to Tcl's core for virtual filesystem support are actually very minor (these have been implemented). Every occurrence of a Tclp-filesystem call must be replaced by a call to a hookable procedure. The current hookable procedure list is as follows:

/*
 * struct Tcl_Filesystem:
 *
 * One such structure exists for each type (kind) of filesystem.
 * It collects together in one place all the functions that are
 * part of the specific filesystem.  Tcl always accesses the
 * filesystem through one of these structures.
 * 
 * Not all entries need be non-NULL; any which are NULL are simply
 * ignored.  However, a complete filesystem should provide all of
 * these functions.
 */

typedef struct Tcl_Filesystem {
    CONST char *typeName;   /* The name of the filesystem. */
    int structureLength;    /* Length of this structure, so future
                             * binary compatibility can be assured */
    Tcl_FilesystemVersion version;  
                            /* Version of the filesystem type. */
    TclStatProc_ *statProc; /* Function to process a 'Tcl_Stat()' call */
    TclAccessProc_ *accessProc;            
                            /* Function to process a 'Tcl_Access()' call */
    TclOpenFileChannelProc_ *openFileChannelProc; 
                            /* Function to process a 'Tcl_OpenFileChannel()' call */
    TclMatchFilesTypesProc_ *matchFilesTypesProc;  
                            /* Function to process a 'Tcl_MatchFilesTypes()' */
    TclGetCwdProc_ *getCwdProc;     
                            /* Function to process a 'Tcl_GetCwd()' call */
    TclChdirProc_ *chdirProc;            
                            /* Function to process a 'Tcl_Chdir()' call */
    TclLstatProc_ *lstatProc;            
                            /* Function to process a 'Tcl_Lstat()' call */
    TclCopyFileProc_ *copyFileProc; 
                            /* Function to process a 'Tcl_CopyFile()' call */
    TclDeleteFileProc_ *deleteFileProc;            
                            /* Function to process a 'Tcl_DeleteFile()' call */
    TclRenameFileProc_ *renameFileProc;            
                            /* Function to process a 'Tcl_RenameFile()' call */
    TclCreateDirectoryProc_ *createDirectoryProc;            
                            /* Function to process a 'Tcl_CreateDirectory()' call */
    TclCopyDirectoryProc_ *copyDirectoryProc;            
                            /* Function to process a 'Tcl_CopyDirectory()' call */
    TclRemoveDirectoryProc_ *removeDirectoryProc;            
                            /* Function to process a 'Tcl_RemoveDirectory()' call */
    TclLoadFileProc_ *loadFileProc; 
                            /* Function to process a 'Tcl_LoadFile()' call */
    TclUnloadFileProc_ *unloadFileProc;            
                            /* Function to unload a previously successfully
                             * loaded file */
    TclReadlinkProc_ *readlinkProc; 
                            /* Function to process a 'Tcl_Readlink()' call */
    TclListVolumesProc_ *listVolumesProc;            
                            /* Function to list any filesystem volumes added
                             * by this filesystem */
    TclFileAttrStringsProc_ *fileAttrStringsProc;
                            /* Function to list all attributes strings which
                             * are valid for this filesystem */
    TclFileAttrsGetProc_ *fileAttrsGetProc;
                            /* Function to process a 'Tcl_FileAttrsGet()' call */
    TclFileAttrsSetProc_ *fileAttrsSetProc;
                            /* Function to process a 'Tcl_FileAttrsSet()' call */
    TclUtimeProc_ *utimeProc;       
                            /* Function to process a 'Tcl_Utime()' call */
    TclNormalizePathProc_ *normalizePathProc;       
                            /* Function to normalize a path */
} Tcl_Filesystem;

Once that is done, almost no more changes need be made to Tcl's core. We must simply add code (to tclIOUtil.c and declarations to tclInt.h) to implement the hookable functions and to provide a simple API by which extensions can hook into the new filesystem support.

The very last change we are currently aware of is that cross-filesystem copy and rename operations will fail. A patch was added so that Tcl can fallback on 'open r/open w/fcopy/file mtime' as a copying method.

Finally to support Extension/Tcl-scripted vfs's in a more robust, clean fashion, we have added four other small changes to this TIP:

Note that no existing code in the win/mac/unix directories was changed at all.

As mentioned above an implementation of all of this now exists. The modified Tcl core passes all Tcl tests, and works with TclKit.

The second half of the proposal is to objectify all of Tcl's filesystem API, to provide caching of both native file representation and filesystem to use. This does not yet exist, although some efforts have been made in this direction.

Philosophy

This TIP is influenced by the thoughts behind the TkGS project (http://sourceforge.net/projects/tkgs/). Whereas TkGS provides a general and efficient graphics system, the aim of this TIP is to provide a similarly general and efficient filesystem.

Alternatives

1. Alternatives to adding vfs support

TclKit manages a pretty good job of vfs support. It is limited by the inadequacy of overriding at the Tcl level. Prowrap is limited by the inability to glob, load, cd, pwd, etc.

There are currently no better alternatives: if Tcl's C core calls C functions directly (as it does), or if extensions call C functions directly (as they do), then complete vfs support requires a patch like this to Tcl's core.

2. Alternatives to objectification

The existing patch adds VFS support to Tcl's core, and requires very few core changes at all. It could be adopted instead of an objectified filesystem. This would make Tcl's filesystem more complete, but would not make it any more efficient.

Objections

Won't all these hooks slow down Tcl's core a lot?

There are actually remarkably few changes required, so the only slowdown would occur if additional filesystems are hooked into the core. This is similar to the impact of the 'stacked channels' implementation. Once the filesystem is objectified, this will actually speed up Tcl's core.

Won't this break backwards compatibility ("The Tcl question")?

Not at all. With the current 'string-based' vfs patch, the entire test suite passes as before, even with an extra 'reporting' filesystem activated.

Won't this make Tcl's core more complex??

Adding a Tcl_Obj interface is definitely a bit more complex than the existing string-based system. However one result will be that Tcl's filesystem is properly abstracted away, which conceptually simplifies the core (there will be 10-15 functions which are called for all filesystem access, whether it is native or virtual).

Future thoughts

This section contains items which are outside the scope of this TIP, but it was thought useful to raise and have documented for the record.

This patch still places the native filesystem in a preferential position, and it is hard-coded as the tail of the fs-lookup list. There are two changes which could be made in the future:

Also,

Copyright

This document has been placed in the public domain.


Powered by TclThis is not necessarily the current version of this TIP.

TIP AutoGenerator - written by Donal K. Fellows