Path Name Portability Guide

Introduction
name_check functions
File and directory name recommendations

Introduction

Like any other C++ program which performs I/O operations, there is no guarantee that a program using the Filesystem Library will be portable between operating systems. Critical aspects of I/O, such as how the operating system interprets paths, are unspecified by the C and C++ Standards.

It is not possible to know if a file or directory name will be valid (and thus portable) for an unknown operating system. There is always the possibility that an operating system could use names which are unusual (numbers less than 4096, for example) or very limited in size (maximum of six character names, for example). In other words, portability is never absolute; it is always relative to specific operating or file systems.

It is possible to know in advance if a directory or file name is likely to be valid for a particular operating system. It is also possible to construct names which are likely to be portable to a large number of modern and legacy operating systems.

Almost all modern operating systems support multiple file systems. At the minimum, they support a native file system plus a CD-ROM file system (Generally ISO-9669, often with Juliet extensions).

Each file system may have its own naming rules. For example, modern versions of Windows support NTFS, FAT, FAT32, and ISO-9660 file systems, among others, and the naming rules for some of those file systems differ a great deal. Each file system may have differing rules for overall path validity, such as a maximum length or number of sub-directories.

As a result, the Boost Filesystem Library's name_check mechanism cannot guarantee directory and file name portability. Rather, it is intended to give the programmer a "fighting chance" to achieve portability by early detection of common naming problems.

name_check functions

A name_check function returns true if its argument is a valid name for a particular operating or file system. A number of these functions are supplied, and user-supplied name_check functions are also allowed.

The portable_name function is of particular interest because it is the initial default name_check function and is carefully chosen to provide wide portability yet without severe restrictions on expressiveness.

The native function is of particular interest because it is often used when the source of the path is operator input or other sources which are formatted according to operating system rules.

Library Supplied name_check Functions
Name Description
portable_posix_name Returns true for names containing only the characters specified in Portable Filename Character Set rules as defined in by POSIX (www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap03.html).
Only characters allowed are 0-9, a-z, A-Z, '.', '_', and '-'.

Use: applications which must be portable to any POSIX system.

windows_name Returns true for names containing only the characters specified by the Windows platform SDK as valid regardless of the file system. Allows any character except 0x0-0x1F, '<', '>', ':', '"', '/', '\', and '|'. Furthermore, names must not end with a trailing space or period.

Use: applications which must be portable to Windows.

Note: Reserved device names are not valid as file names, but are not being detected because they are still valid as a path. Specifically, CON, PRN, AUX, CLOCK$, NUL, COM[1-9], LPT[1-9], and these names followed by an extension (for example, NUL.tx7).

portable_name windows_name(name) && portable_posix_name(name), and first character not period or hyphen.

Note: This is the initial default name_check.

Use: applications which must be portable to a wide variety of modern operating systems, large and small, and to some legacy O/S's.

portable_directory_name portable_name(name), and no periods.

Use: applications which must be portable to a wide variety of platforms, including OpenVMS.

portable_file_name portable_name(name),except allows one period only, and only if followed by one to three additional characters.

Use: applications which must be portable to a wide variety of platforms, including OpenVMS and other systems which have a concept of "file extension" but limit its length.

no_check Returns true.

Use: When the generic grammar is desired, but name checking is not desired. For example, a program which traffics in names created elsewhere may have no choice but to accept those names. Another example is a application which prefers to use the Filesystem Library and its generic grammar, but is uninterested in portability. An alternative to no_check might be native, but native has the side effect of altering the grammar accepted.

native Implementation defined name_check. Guaranteed to return true for all names considered valid by the operating system.

Side effect: Syntax for path constructor src string is implementation defined according to the path syntax rules for the operating system.

Use: In path constructors, when the source is operator input or other sources which are formatted according to operating system rules. Note that  default_name_check( native ) causes all path src strings to be treated according to the path syntax rules for the operating system, which may or may not be desirable.

Note: May return true for some names not considered valid by the operating system under all conditions (particularly on operating systems which support multiple file systems.)

File and directory name recommendations

Recommendation Rationale
Limit file and directory names to the characters A-Z, a-z, 0-9, period, hyphen, and underscore.

Use any of the "portable_" name check functions to enforce this recommendation.

These are the characters specified by the POSIX standard for portable directory and file names, and are also valid for Windows, Mac, and many other modern filesystems.
Do not use a period or hyphen as the first character of a name. Do not use period as the last character of a name.

Use portable_name, portable_directory_name, or portable_file_name to enforce this recommendation.

Some operating systems treat have special rules for the first character of names. POSIX, for example. Windows does not permit period as the last character.
Do not use periods in directory names.

Use portable_directory_name to enforce this recommendation.

Requirement for ISO-9660 without Juliet extensions, OpenVMS native filesystem, and other legacy systems.
Do not use more that one period in a file name, and limit the portion after the period to three characters.

Use portable_file_name to enforce this recommendation.

Requirement for ISO-9660 level 1, OpenVMS native filesystem, and other legacy systems.
Do not assume names are case sensitive. For example, do not expected a directory to be able to hold separate elements named "Foo" and "foo". Some filesystems are case insensitive.  For example, Windows NTFS is case preserving in the way it stores names, but case insensitive in searching for names (unless running under the POSIX sub-system, it which case it does case sensitive searches).
Do not assume names are case insensitive.  For example, do not expect a file created with the name of "Foo" to be opened successfully with the name of "foo". Some filesystems are case sensitive.  For example, POSIX.
Don't use hyphens in names. ISO-9660 level 1, and possibly some legacy systems, do not permit hyphens.
Limit the length of the string returned by path::string() to 255 characters.  Note that ISO 9660 has an explicit directory tree depth limit of 8, although this depth limit is removed by the Juliet extensions. Some operating systems place limits on the total path length.  For example, Windows 2000 limits paths to 260 characters total length.
Limit the length of any one name in a path.  Pick the specific limit according to the operating systems and or file systems you wish portability to:
   Not a concern::  POSIX, Windows, MAC OS X.
   31 characters: Classic Mac OS
   8 characters + period + 3 characters: ISO 9660 level 1
   32 characters: ISO 9660 level 2 and 3
   128 characters (64 if unicode): ISO 9660 with Juliet extensions
Limiting name length can markedly reduce the expressiveness of file names, yet placing only very high limits on lengths inhibits widest portability.

Revised 01 December, 2003

© Copyright Beman Dawes, 2002, 2003

Use, modification, and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at www.boost.org/LICENSE_1_0.txt)