path.hpp

Introduction
Grammar for generic path strings
Canonical form
Header synopsis
Class path
    Native path representation
    Representation example
    Caution for POSIX and UNIX programmers
    Good programming practice: relative paths
    Path equality vs path equivalence
Member functions
Non-member functions
Default name_check mechansim
Rationale
Path decomposition examples

Introduction

Filesystem Library functions traffic in objects of class path, provided by this header. The header also supplies non-member functions for error checking.

For actual operations on files and directories, see boost/filesystem/operations.hpp documentation.

For file I/O stream operations, see boost/filesystem/fstream.hpp documentation.

The Filesystem Library's Common Specifications apply to all member and non-member functions supplied by this header.

The Portability Guide discusses path naming issues which are important when portability is a concern.

Class path provides for portable mechanism for representing paths in C++ programs, using a portable generic path string grammar. Class path is concerned with the lexical and syntactic aspects of a path. The path does not have to exist in the operating system's filesystem, and may contain names which are not even valid for the current operating system.

Rationale: If Filesystem functions trafficked in std::strings or C-style strings, the functions would provide only an illusion of portability since the function calls would be portable but the strings they operate on would not be portable.

Conceptual model of a path

An object of class path can be conceptualized as containing a sequence of strings. Each string is said to be an element of the path. Each element represents the name of a directory, or, in the case of the string representing the element farthest from the root in the directory hierarchy, the name of a directory or file. The names ".." and "." are reserved to represent the concepts of parent-directory and directory-placeholder.

This conceptual path representation is independent of any particular representation of the path as a single string.

There is no requirement that an implementation of class path actually contain a sequence of strings, but conceptualizing the contents as a sequence of strings provides a completely portable way to reason about paths.

So that programs can portably express paths as a single string, class path defines a grammar for a portable generic path string format, and supplies constructor and append operations taking such strings as arguments. Because user input or third-party library functions may supply path strings formatted according to operating system specific rules, an additional constructor is provided which takes a system-specific format as an argument.

Access functions are provided to retrieve the contents of a object of class path formatted as a portable path string, a directory path string using the operating system's format, and a file path string using the operating system's format. Additional access functions retrieve specific portions of the contained path.

Grammar for portable generic path strings

The grammar is specified in extended BNF, with terminal symbols in quotes:

path ::= [root] [relative-path]  // an empty path is valid

root ::= [root-name] [root-directory]

root-directory ::= "/"

relative-path ::= path-element { "/" path-element } ["/"]

path-element ::= name | parent-directory | directory-placeholder

name ::= char { char }

directory-placeholder ::= "."

parent-directory ::= ".."

root-name grammar is implementation-defined. root-name must not be present in generic input. It may be part of the strings returned by path member functions, and may be present in the src argument to path constructors when the native name check is in effect.

char may not be slash ('/') or '\0'.

Although implementation-defined, it is desirable that root-name have a grammar which is distinguishable from other grammar elements, and follow the conventions of the operating system.

The optional trailing "/" in a relative-path is allowed as a notational convenience. It has no semantic meaning and is simply discarded.

Whether or not a generic path string is actually portable to a particular operating system will depend on the names used. See the Portability Guide.

Canonical form

All operations modifying path objects leave the path object in canonical form.

An empty path is in canonical form.

A non-empty path is converted to canonical form as if by first converting it to the conceptual model, and then:

Repeatedly replacing any leading root-directory, parent-directory elements with a single root-directory element. Rationale: Both POSIX and Windows specify this reduction; specifying it for canonical form ensures portable semantics for other operating systems.
Removing each directory-placeholder element.
If the path is now empty, add a single directory-placeholder element.

Normalized form

Normalized form is the same as canonical form, except that adjacent name, parent-directory elements are recursively removed.

Thus a non-empty path in normal form either has no directory-placeholders, or consists solely of one directory-placeholder. If it has parent-directory elements, they precede all name elements.

Header boost/filesystem/path.hpp synopsis

namespace boost
{
  namespace filesystem
  {
    class path
    {
    public:
      typedef bool (*name_check)( const std::string & name );

      // compiler generates copy constructor,
      // copy assignment, and destructor

      // constructors:
      path();
      path( const std::string & src );
      path( const char * src );
      path( const std::string & src, name_check checker );
      path( const char * src, name_check checker );

      // append operations:
      path & operator /= ( const path & rhs );
      path   operator /  ( const path & rhs ) const;

      // conversion functions:
      const std::string & string() const;
      std::string native_file_string() const;
      std::string native_directory_string() const;
      
      // modification functions:
      path &      normalize();

      // decomposition functions:
      path        root_path() const;
      std::string root_name() const;
      std::string root_directory() const;
      path        relative_path() const;
      std::string leaf() const;
      path        branch_path() const;
      
      // query functions: 
      bool empty() const;
      bool is_complete() const;
      bool has_root_path() const;
      bool has_root_name() const;
      bool has_root_directory() const;
      bool has_relative_path() const;
      bool has_leaf() const;
      bool has_branch_path() const;
      
      // iteration:
      typedef implementation-defined iterator;
      iterator begin() const;
      iterator end() const;

      // default name_check mechanism:
      static bool default_name_check_writable(); 
      static name_check default_name_check( name_check new_check );
      static name_check default_name_check();

      // relational operators:
      bool operator==( const path & that ) const;
      bool operator!=( const path & that ) const;
      bool operator<( const path & that ) const;
      bool operator<=( const path & that ) const;
      bool operator>( const path & that ) const;
      bool operator>=( const path & that ) const;

    private:
      std::vector<std::string> m_name;  // for exposition only
    };

    path operator / ( const char * lhs, const path & rhs );
    path operator / ( const std::string & lhs, const path & rhs );

    // name_check functions
    bool portable_posix_name( const std::string & name );
    bool windows_name( const std::string & name );
    bool portable_name( const std::string & name );
    bool portable_directory_name( const std::string & name );
    bool portable_file_name( const std::string & name );
    bool no_check( const std::string & name );
    bool native( const std::string & name );
  }
}

For the sake of exposition, class path member functions are described as if the class contains a private member std::vector<std::string> m_name. Actual implementations may differ.

Class path member, or non-member operator/, functions may throw a filesystem_error exception if the path is not in the syntax specified for the grammar.

Note: There is no guarantee that a path object represents a path which is considered valid by the current operating system. A path might be invalid to the operating system because it contains invalid names (too long, invalid characters, and so on), or because it is a partial path still as yet unfinished by the program. An invalid path will normally be detected at time of use, such as by one of the Filesystem Library's operations or fstream functions.

Portability Warning: There is no guarantee that a path object represents a path which would be portable to another operating system. A path might be non-portable because it contains names which the operating systems considers too long or contains invalid characters. A default name_check mechanism is provided to aid in the detection of non-portable names, or a name_check function can be specified in path constructors. The library supplies several name_check functions, or users can supply their own.

Native path representation

Several path member functions return representations of m_name in formats specific to the operating system. These formats are implementation defined. If an m_name element contains characters which are invalid under the operating system's rules, and there is an unambiguous translation between the invalid character and a valid character, the implementation is required to perform that translation. For example, if an operating system does not permit lowercase letters in file or directory names, these letters will be translated to uppercase if unambiguous. Such translation does not apply to generic path string format representations.

Representation example

The rule-of-thumb is to use string() when a generic string representation of the path is required, and use either native_directory_string() or native_file_string() when a string representation formatted for the particular operating system is required.

The difference between the representations returned by string(), native_directory_string(), and native_file_string() are illustrated by the following code:

path my_path( "foo/bar/data.txt" );
std::cout
  << "string------------------: " << my_path.string() << '\n'
  << "native_directory_string-: " << my_path.native_directory_string() << '\n'
  << "native_file_string------: " << my_path.native_file_string() << '\n';

On POSIX systems, the output would be:

string------------------: foo/bar/data.txt
native_directory_string-: foo/bar/data.txt
native_file_string------: foo/bar/data.txt

On Windows, the output would be:

string------------------: foo/bar/data.txt
native_directory_string-: foo\bar\data.txt
native_file_string------: foo\bar\data.txt

On classic Mac OS, the output would be:

string------------------: foo/bar/data.txt
native_directory_string-: foo:bar:data.txt
native_file_string------: foo:bar:data.txt

On a hypothetical operating system using OpenVMS format representations, it would be:

string------------------: foo/bar/data.txt
native_directory_string-: [foo.bar.data.txt]
native_file_string------: [foo.bar]data.txt

Note that that because OpenVMS uses period as both a directory separator character and as a separator between filename and extension, native_directory_string() in the example produces a useless result. On this operating system, the programmer should only use this path as a file path. (There is a portability recommendation to not use periods in directory names.)

Caution for POSIX and UNIX programmers

POSIX and other UNIX-like operating systems have a single root, while most other operating systems have multiple roots. Multi-root operating systems require a root-name such as a drive, device, disk, volume, or share name for a path to be resolved to an actual specific file or directory. Because of this, the root() and root_directory() functions return identical results on UNIX and other single-root operating systems, but different results on multi-root operating systems. Thus use of the wrong function will not be apparent on UNIX-like systems, but will result in non-portable code which will fail when used on multi-root systems. UNIX programmers are cautioned to use particular care in choosing between root() and root_directory(). If undecided, use root().

The same warning applies to has_root() and has_root_directory().

Good programming practice: relative paths

It is usually bad programming practice to hard-code complete paths into programs. Such programs tend to be fragile because they break when directory trees get reorganized or the programs are moved to other machines or operating systems.

The most robust way to deal with path completion is to hard-code only relative paths. When a complete path is required, it can be obtained in several ways:

Implicitly. Allow the operating system to complete the path according to the operating system's path completion algorithm. For example:
```
    create_directory( "foo" ); // operating system will complete path
```
User input. The path is often best constructed with the native name check, so that the user input follows the operating system's native path format, which will usually be what the program user expects. For example:
```
    path foo( argv[1], native );
    foo /= "foo";
```
initial_path(). Particularly for command line programs, specifying paths relative to the current path at the time the program is started is a common practice. For example:
```
    path foo( initial_path() / "foo" );
```
Algorithmically. See complete() and system_complete() functions.

Path equality vs path equivalence

Are paths "abc" and "ABC" equal? No, never, if you determine equality via class path's operator==, which considers only the two paths lexical representations.

Do paths "abc" and "ABC" resolve to the same file or directory? The answer is "yes", "no", or "maybe" depending on the external file system. The (pending) operations function equivalent() is the only way to determine if two paths resolve to the same external file system entity.

Programmers wishing to determine if two paths are "the same" must decide if that means "the same representation" or "resolve to the same actual file or directory", and choose the appropriate function accordingly.

Member functions

constructors

path();
Effects: Default constructs an object of class path.

Postcondition: path().empty()
path( const std::string & src, name_check checker );
path( const char * src, name_check checker );
path( const std::string & src );
path( const char * src );
For the single-argument forms, default_name_check() is used as checker.

Precondition: src != 0.

Effects: Select the grammar as follows:

If checker == native, the operating system's implementation defined grammar for paths.

else if checker == no_check, the generic path string grammar with optional root-name.

else the generic path string grammar without root-name.

Parse src into a sequence of names, according to the grammar, then, for each name in src, m_name.push_back( name ).

Throws: For each name in src, throw if checker( name ) returns false.

Postcondition: m_name is in canonical form. For the single-argument forms only, !default_name_check_writable().

Rationale: The single-argument constructors are not explicit because an intended use is automatic conversion of strings to paths.

operator /=

path & operator/=( const path & rhs );
Effects: If any of the following conditions are met, then m_name.push_back("/").

has_relative_path().

!is_absolute() && has_root_name(), and the operating system requires the system-specific root be absolute

Then append rhs.m_name to m_name.

(Footnote: Thus on Windows, (path("//share") /= "foo").string() is "//share/foo")

Returns: *this

Postcondition: m_name is in canonical form.

Rationale: It is not considered an error for rhs to include a root-directory because m_name might be relative or empty, and thus it is valid for rhs to supply root-directory. For example, on Windows, the following must succeed:
path p( "c:", native );
p /= "/foo";
assert( p.string() == "c:/foo" );

operator /

const path operator/ ( const path & rhs ) const;
Returns: path( *this ) /= rhs

Rationale: Operator / is supplied because together with operator /=, it provides a convenient way for users to supply paths with a variable number of elements. For example, initial_path() / "src" / test_name. Operator+ and operator+= were considered as alternatives, but deemed too easy to confuse with those operators for std::string. Operator<< and operator=<< were used originally until during public review Dave Abrahams pointed out that / and /= match the generic path syntax.

Note: Also see non-member operator/ functions.

normalize

path & normalize();

Postcondition: m_name is in normalized form.

Returns: *this

string

const std::string & string() const;
Returns: The contents of m_name, formatted according to the rules of the generic path string grammar.

Note: The returned string must be unambiguous according to the grammar. That means that for an operating system with root-names indistinguishable from relative-path names, names containing "/", or allowing "." or ".." as directory or file names, escapes or other mechanisms will have to be introduced into the grammar to prevent ambiguities. This has not been done yet, since no current implementations are on operating systems with any of those problems.

See: Representation example above.

native_file_string

std::string native_file_string() const;
Returns: The contents of m_name, formatted in the system-specific representation of a file path.

See: Representation example above.

Naming rationale: The name is deliberately ugly to warn users that this function yields non-portable results.

native_directory_string

const std::string native_directory_string() const;
Returns: The contents of m_name, formatted in the system-specific representation of a directory path.

See: Representation example above.

Naming rationale: The name is deliberately ugly to warn users that this function yields non-portable results.

root_path

path root_path() const;
Returns: root_name() / root_directory()

Portably provides a copy of a path's full root path, if any. See Path decomposition examples.

root_name

std::string root_name() const;
Returns: If !m_name.empty() && m_name[0] is a root-name, returns m_name[0], else returns a null string.

Portably provides a copy of a path's root-name, if any. See Path decomposition examples.

root_directory

std::string root_directory() const;
Returns: If the path contains root-directory, then string("/"), else string().

Portably provides a copy of a path's root-directory, if any. The only possible results are "/" or "". See Path decomposition examples.

relative_path

path relative_path() const;
Returns: A new path containing only the relative-path portion of the source path.

Portably provides a copy of a path's relative portion, if any. See Path decomposition examples.

leaf

std::string leaf() const;
Returns: empty() ? std::string() : m_name.back()

A typical use is to obtain a file or directory name without path information from a path returned by a directory_iterator. See Path decomposition examples.

branch_path

path branch_path() const;
Returns: m_name.size() <= 1 ? path("") : x, where x is a path constructed from all the elements of m_name except the last.

A typical use is to obtain the parent path for a path supplied by the user. See Path decomposition examples.

empty

bool empty() const;
Returns: string().empty().

The path::empty() function determines if a path string itself is empty. To determine if the file or directory identified by the path is empty, use the operations.hpp is_empty() function.

Naming rationale: C++ Standard Library containers use the empty name for the equivalent functions.

is_complete

bool is_complete() const;
Returns: For single-root operating systems, has_root_directory(). For multi-root operating systems, has_root_directory() && has_root_name().

Naming rationale: The alternate name, is_absolute(), causes confusion and controversy because on multi-root operating systems some people believe root_name() should participate in is_absolute(), and some don't. See the FAQ.

Note: On most operating systems, a complete path always unambiguously identifies a specific file or directory. On a few systems (classic Mac OS, for example), even a complete path may be ambiguous in unusual cases because the OS does not require unambiguousness.

has_root_path

bool has_root_path() const;
Returns: has_root_name() || has_root_directory()

has_root_name

bool has_root_name() const;
Returns: !root_name().empty()

has_root_directory

bool has_root_directory() const;
Returns: !root_directory().empty()

has_relative_path

bool has_relative_path() const;
Returns: !relative_path().empty()

has_leaf

bool has_leaf() const;
Returns: !leaf().empty()

has_branch_path

bool has_branch_path() const;
Returns: !branch_path().empty()

iterator

typedef implementation-defined iterator;

A const iterator meeting the C++ Standard Library requirements for bidirectional iterators (24.1). The iterator is a class type (so that operator++ and -- will work on temporaries). The value, reference, and pointer types are std::string, const std::string &, and const std::string *, respectively.

begin

iterator begin() const;

Returns: m_path.begin()

end

iterator end() const;

Returns: m_path.end()

default_name_check_writable

static bool default_name_check_writable();

Returns: True, unless a default_name_check function has been previously called.

default_name_check

static void default_name_check( name_check new_check );

Precondition: new_check != 0

Postcondition: default_name_check(new_check) && !default_name_check_writable()

Throws: if !default_name_check_writable()

static name_check default_name_check();

Returns: the default name_check.

Postcondition: !default_name_check_writable()

operator ==

bool operator==( const path & that ) const;
Returns: !(*this < that) && !(that < *this)

See Path equality vs path equivalence.

operator !=

bool operator!=( const path & that ) const;
Returns: !(*this == that)

See Path equality vs path equivalence.

operator <

bool operator<( const path & that ) const;
Returns: std::lexicographical_compare( begin(), end(), that.begin(), that.end() )

See Path equality vs path equivalence.

Rationale: Relational operators are provided to ease uses such as specifying paths as keys in associative containers. Lexicographical comparison is used because:

Even though not a full-fledged standard container, paths are enough like containers to merit meeting the C++ Standard Library's container comparison requirements (23.1 table 65).

The alternative is to return this->string(), that.string(). But path::string() as currently specified can yield non-unique results for differing paths. The case (from Peter Dimov) is path("first/")/"second" and path("first")/"second" both returning a string() of "first//second".

operator <=

bool operator<=( const path & that ) const;
Returns: !(that < *this)

See Path equality vs path equivalence.

operator >

bool operator>( const path & that ) const;
Returns: that < *this

See Path equality vs path equivalence.

operator >=

bool operator>=( const path & that ) const;
Returns: !(*this < that)

See Path equality vs path equivalence.

Non-member functions

Non-member operator /

path operator / ( const char * lhs, const path & rhs ); path operator / ( const std::string & lhs, const path & rhs );

Returns: path( lhs ) /= rhs

Default name_check mechanism

It is difficult or impossible to write portable programs without some way to verify that directory and file names are portable. Without automatic name checking, verification is tedious, error prone, and ugly. Yet no single name check function can serve all applications, and within an application different paths or portions of paths may require different name check functions. Sometimes there should be no checking at all.

Those needs are met by providing a default name check function to meet an application's most common needs, and then providing path constructors which override the default name check function to handle less common needs. The default name check function can be set by the application, allowing the most common case for the particular application to be handled by the default check.

Dangers

The default name check function is set and retrieved by path static member functions, and as such is similar to a global variable. Since global variables are considered harmful [Wulf-Shaw-73], class path allows the default name check function can be set only once, and only before the first use. This turns a dangerous global variable into a safer global constant. Even with this protection, the ability to set the default name check function is still a powerful feature, and is still dangerous in that it can change the behavior of code buried out-of-sight in libraries or elsewhere. Thus changing the default error check function should only be done when explicitly specifying the function via the two argument path constructors is not reasonable.

Rationale

Also see the FAQ for additional rationale.

Function naming: Class path member function names and operations.hpp non-member function names were chosen to be somewhat distinct from one another. The objective was to avoid cases like foo.empty() and empty( foo ) both being valid, but with completely different semantics. At one point path::empty() was renamed path::is_null(), but that caused many coding typos because std::string::empty() is often used nearby.

Decomposition functions: Decomposition functions are provided because without them it is impossible to write portable path manipulations. Convenience is also a factor.

Const vs non-const returns: In some earlier versions of the library, member functions returned values as const rather than non-const. See Scott Myers, Effective C++, Item 21. The const qualifiers were eliminated (1) to conform with C++ Standard Library practice, (2) because non-const returns allow occasionally useful expressions, and (3) because the number of coding errors eliminated were deemed rare. A requirement that path::iterator not be a non-class type was added to eliminate errors non-const iterator errors.

Path decomposition examples

It is often useful to extract specific elements from a path object. While any decomposition can be achieved by iterating over the elements of a path, convenience functions are provided which are easier to use, more efficient, and less error prone.

The first column of the table gives the example path, formatted by the string() function. The second column shows the values which would be returned by dereferencing each element iterator. The remaining columns show the results of various expressions.

p.string()	Elements	p.root_ path()	p.root_ name()	p.root_ directory()	p.relative_ path()	p.root_ directory() / p.relative_ path()	p.root_ name() / p.relative_ path()	p.branch_ path()	p.leaf()
All systems
`/`	`/`	`/`	`""`	`/`	`""`	`/`	`""`	`""`	`/`
`foo`	`foo`	`""`	`""`	`""`	`foo`	`foo`	`foo`	`""`	`foo`
`/foo`	`/,foo`	`/`	`""`	`/`	`foo`	`/foo`	`foo`	`/`	`foo`
`foo/bar`	`foo,bar`	`""`	`""`	`""`	`foo/bar`	`foo/bar`	`foo/bar`	`foo`	`bar`
`/foo/bar`	`/,foo,bar`	`/`	`""`	`/`	`foo/bar`	`/foo/bar`	`foo/bar`	`/foo`	`bar`
`.`	`.`	`""`	`""`	`""`	`.`	`.`	`.`	`""`	`.`
`..`	`..`	`""`	`""`	`""`	`..`	`..`	`..`	`""`	`..`
`../foo`	`..,foo`	`""`	`""`	`""`	`../foo`	`../foo`	`../foo`	`..`	`foo`
Windows
`c:`	`c:`	`c:`	`c:`	`""`	`""`	`""`	`c:`	`""`	`c:`
`c:/`	`c:,/`	`c:/`	`c:`	`/`	`""`	`/`	`c:`	`c:`	`/`
`c:..`	`c:,..`	`c:`	`c:`	`""`	`..`	`c:..`	`c:..`	`c:`	`..`
`c:foo`	`c:,foo`	`c:`	`c:`	`""`	`foo`	`foo`	`c:foo`	`c:`	`foo`
`c:/foo`	`c:,/,foo`	`c:/`	`c:`	`/`	`foo`	`/foo`	`c:foo`	`c:/`	`foo`
`//shr`	`//shr`	`//shr`	`//shr`	`""`	`""`	`""`	`//shr`	`""`	`//shr`
`//shr/`	`//shr,/`	`//shr/`	`//shr`	`/`	`""`	`/`	`//shr`	`//shr`	`/`
`//shr/foo`	`//shr, /,foo`	`//shr/`	`//shr`	`/`	`foo`	`/foo`	`//shr/foo`	`//shr/`	`foo`
`prn:`	`prn:`	`prn:`	`prn:`	`""`	`""`	`""`	`prn:`	`""`	`prn:`

Revised 14 March, 2004

Use, modification, and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at www.boost.org/LICENSE_1_0.txt)