Filesystem Library Design

Introduction
Requirements
Realities
Rationale
Abandoned_Designs
References

Introduction

The primary motivation for beginning work on the Filesystem Library was frustration with Boost administrative tools.  Scripts were written in Python, Perl, Bash, and Windows command languages.  There was no single scripting language familiar and acceptable to all Boost administrators. Yet they were all skilled C++ programmers - why couldn't C++ be used as the scripting language?

The key feature C++ lacked for script-like applications was the ability to perform portable filesystem operations on directories and their contents. The Filesystem Library was developed to fill that void.

The intent is not to compete with traditional scripting languages, but to provide a solution for situations where C++ is already the language of choice..

Requirements

Realities

Rationale

The Requirements and Realities above drove much of the C++ interface design.  In particular, the desire to make script-like code straightforward caused a great deal of effort to go into ensuring that apparently simple expressions like exists( "foo" ) work as expected.

See the FAQ for the rationale behind many detailed design decisions.

Several key insights went into the path class design:

Error checking was a particularly difficult area. One key insight was that with file and directory names, portability isn't a universal truth.  Rather, the programmer must think out the question "What operating systems do I want this path to be portable to?"  By providing support for several answers to that question, the Filesystem Library alerts programmers of the need to ask it in the first place.

Abandoned Designs

operations.hpp

Dietmar Kühl's original dir_it design and implementation supported wide-character file and directory names. It was abandoned after extensive discussions among Library Working Group members failed to identify portable semantics for wide-character names on systems not providing native support. See FAQ.

Previous iterations of the interface design used explicitly named functions providing a large number of convenience operations, with no compile-time or run-time options. There were so many function names that they were very confusing to use, and the interface was much larger. Any benefits seemed theoretical rather than real.

Designs based on compile time (rather than runtime) flag and option selection (via policy, enum, or int template parameters) became so complicated that they were abandoned, often after investing quite a bit of time and effort. The need to qualify attribute or option names with namespaces, even aliases, made use in template parameters ugly; that wasn't fully appreciated until actually writing real code.

Yet another set of convenience functions ( for example, remove with permissive, prune, recurse, and other options, plus predicate, and possibly other, filtering features) were abandoned because the details became both complex and contentious.

What is left is a toolkit of low-level operations from which the user can create more complex convenience operations, plus a very small number of convenience functions which were found to be useful enough to justify inclusion.

path.hpp

There were so many abandoned path designs, I've lost track. Policy-based class templates in several flavors, constructor supplied runtime policies, operation specific runtime policies, they were all considered, often implemented, and ultimately abandoned as far too complicated for any small benefits observed.

Additional design considerations apply to Internationalization.

error checking

A number of designs for the error checking machinery were abandoned, some after experiments with implementations. Totally automatic error checking was attempted in particular. But automatic error checking tended to make the overall library design much more complicated.

Some designs associated error checking mechanisms with paths.  Some with operations functions.  A policy-based error checking template design was partially implemented, then abandoned as too complicated for everyday script-like programs.

The final design, which depends partially on explicit error checking function calls,  is much simpler and straightforward, although it does depend to some extent on programmer discipline.  But it should allow programmers who are concerned about portability to be reasonably sure that their programs will work correctly on their choice of target systems.

References

[IBM-01] IBM Corporation, z/OS V1R3.0 C/C++ Run-Time Library Reference, SA22-7821-02, 2001, www-1.ibm.com/servers/eserver/zseries/zos/bkserv/
[ISO-9660] International Standards Organization, 1988
[Kuhn] UTF-8 and Unicode FAQ for Unix/Linux, www.cl.cam.ac.uk/~mgk25/unicode.html
[MSDN] Microsoft Platform SDK for Windows, Storage Start Page, msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp
[POSIX-01] IEEE Std 1003.1-2001, ISO/IEC 9945:2002, and The Open Group Base Specifications, Issue 6. Also known as The Single Unix® Specification, Version 3. Available from each of the organizations involved in its creation. For example, read online or download from www.unix.org/single_unix_specification/. The ISO JTC1/SC22/WG15 - POSIX homepage is www.open-std.org/jtc1/sc22/WG15/
[URI] RFC-2396, Uniform Resource Identifiers (URI): Generic Syntax, www.ietf.org/rfc/rfc2396.txt
[UTF-16] Wikipedia, UTF-16, en.wikipedia.org/wiki/UTF-16
[Wulf-Shaw-73] William Wulf, Mary Shaw, Global Variable Considered Harmful, ACM SIGPLAN Notices, 8, 2, 1973, pp. 23-34

Revised 18 February, 2010

© Copyright Beman Dawes, 2002

Use, modification, and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at www.boost.org/LICENSE_1_0.txt)