Filesystem Tutorial

Filesystem Tutorial

Filesystem Home Releases Reference Tutorial FAQ Portability V3 Intro V3 Design Deprecated

Introduction
Preliminaries
Reporting the size of a file - (tut1.cpp)
Using status queries to determine file existence and type - (tut2.cpp)
Directory iteration plus catching exceptions - (tut3.cpp)
Using path decomposition, plus sorting results - (tut4.cpp)
Class path: Constructors, including Unicode - (tut5.cpp)
Class path: Generic format vs. Native format
Class path: Iterators, observers, composition, decomposition, and query - (path_info.cpp)
Error reporting

Introduction

This tutorial develops a little command line program to list information about files and directories - essentially a much simplified version of the POSIX ls or Windows dir commands. We'll start with the simplest possible version and progress to more complex functionality. Along the way we'll digress to cover topics you'll need to know about to understand Boost.Filesystem.

Source code for each of the tutorial programs is available, and you are encouraged to compile, test, and experiment with it. To conserve space, we won't always show boilerplate code here, but the provided source is complete and ready to build.

Preliminaries

Install the Boost distribution if you haven't already done so. See the Boost Getting Started docs.

This tutorial assumes you are going to compile and test the examples using the provided scripts. That's highly recommended.

If you are planning to compile and test the examples but not use the scripts, make sure your build setup knows where to locate or build the Boost library binaries.

Fire up your command line interpreter, and type the following commands:

Ubuntu Linux	Microsoft Windows
$ cd boost-root/libs/filesystem/example/test $ ./setup $ ./bld $ ./tut1 Usage: tut1 path	>cd boost-root\libs\filesystem\example\test >setup >bld >tut1 Usage: tut1 path

If the tut1 command outputs "Usage: tut1 path", all is well. A set of tutorial programs has been copied (by setup) to boost-root/libs/filesystem/example/test and then built. You are encouraged to modify and experiment with them as the tutorial progresses. Just invoke the bld script again to rebuild.

If something didn't work right, here are troubleshooting suggestions:

The bjam program executable isn't being found. Check your path environmental variable if it should have been found, otherwise see Boost Getting Started.
Look at bjam.log to try to spot an indication of the problem.

Reporting the size of a file - (tut1.cpp)

Let's get started. One of the simplest things we can do is report the size of a file.

tut1.cpp

#include <iostream>
#include <boost/filesystem.hpp>
using namespace boost::filesystem;

int main(int argc, char* argv[])
{
  if (argc < 2)
  {
    std::cout << "Usage: tut1 path\n";
    return 1;
  }
  std::cout << argv[1] << " " << file_size(argv[1]) << '\n';
  return 0;
}

The Boost.Filesystem file_size function returns a uintmax_t containing the size of the file named by the argument. The declaration looks like this:

uintmax_t file_size(const path& p);

For now, all you need to know is that class path has constructors that take const char * and many other useful types. (If you can't wait to find out more, skip ahead to the class path section of the tutorial.)

Please take a minute to try out tut1 on your system, using a file that is known to exist, such as tut1.cpp. Here is what the results look like on two different operating systems:

Ubuntu Linux

Microsoft Windows

$ ./tut1 tut1.cpp
tut1.cpp 569

$ ls -l tut1.cpp
-rwxrwxrwx 1 root root 569 2010-02-01 07:31 tut1.cpp

>tut1 tut1.cpp
tut1.cpp 592
>dir tut1.cpp
...
01/30/2010 10:47 AM 592 tut1.cpp
...

So far, so good. The reported Linux and Windows sizes are different because the Linux tests used "\n" line endings, while the Windows tests used "\r\n" line endings.

Now try again, but give a path that doesn't exist:

Ubuntu Linux

Microsoft Windows

$ ./tut1 foo
terminate called after throwing an instance of 'boost::exception_detail::
clone_impl<boost::exception_detail::error_info_injector<boost::
filesystem::filesystem_error> >'
  what(): boost::filesystem::file_size: No such file or directory: "foo"
Aborted

>tut1 foo

An exception is thrown; the exact form of the response depends on Windows system options.

What happens? There's no file named foo in the current directory, so an exception is thrown.

Try this:

Ubuntu Linux

Microsoft Windows

$ ./tut1 .
terminate called after throwing an instance of 'boost::exception_detail::
clone_impl<boost::exception_detail::error_info_injector<boost::
filesystem::filesystem_error> >'
  what(): boost::filesystem::file_size: Operation not permitted "."
Aborted

>tut1 .

An exception is thrown; the exact form of the response depends on Windows system options.

The current directory exists, but file_size() works on regular files, not directories, so again, an exception is thrown.

We'll deal with those situations in tut2.cpp.

Using status queries to determine file existence and type - (tut2.cpp)

Boost.Filesystem includes status query functions such as exists, is_directory, and is_regular_file. These return bool's, and will return true if the condition described by their name is met. Otherwise they return false, including when any element of the path argument can't be found.

tut2.cpp uses several of the status query functions to cope with non-existent files and with different kinds of files:

tut2.cpp

int main(int argc, char* argv[])
{
  path p (argv[1]);   // p reads clearer than argv[1] in the following code

  if (exists(p))    // does p actually exist?
  {
    if (is_regular_file(p))        // is p a regular file?   
      cout << p << " size is " << file_size(p) << '\n';

    else if (is_directory(p))      // is p a directory?
      cout << p << "is a directory\n";

    else
      cout << p << "exists, but is neither a regular file nor a directory\n";
  }
  else
    cout << p << "does not exist\n";

  return 0;
}

Give it a try:

Ubuntu Linux	Microsoft Windows
$ ./tut2 tut2.cpp tut2 size is cpp 1037 $ ./tut2 foo foo does not exist $ ./tut2 . . is a directory	>tut2 tut2.cpp tut2.cpp size is 1079 >tut2 foo foo does not exist >tut2 . . is a directory

Although tut2 works OK in these tests, the output is less than satisfactory for a directory. We'd typically like to see a list of the directory's contents. In tut3.cpp we will see how to iterate over directories.

But first, let's try one more test:

Ubuntu Linux

Microsoft Windows

$ ls /home/jane/foo
ls: cannot access /home/jane/foo: Permission denied
$ ./tut2 /home/jane/foo
terminate called after throwing an instance of 'boost::exception_detail::
clone_impl<boost::exception_detail::error_info_injector<boost::
filesystem::filesystem_error> >'
   what(): boost::filesystem::status: Permission denied:
     "/home/jane/foo"
Aborted

>dir e:\
The device is not ready.
>tut2 e:\

An exception is thrown; the exact form of the response depends on Windows system options.

On the Linux system, the test was being run from an account that did not have permission to access /home/jane/foo. On the Windows system, e: was a Compact Disc reader/writer that was not ready. End users shouldn't have to interpret cryptic exceptions reports, so as we move on to tut3.cpp we will increase the robustness of the code, too.

Directory iteration plus catching exceptions - (tut3.cpp)

Boost.Filesystem's directory_iterator class is just what we need here. It follows the general pattern of the standard library's istream_iterator. Constructed from a path, it iterates over the contents of the directory. A default constructed directory_iterator acts as the end iterator.

The value type of directory_iterator is directory_entry. A directory_entry object contains a path and file_status information. A directory_entry object can be used directly, but can also be passed to path arguments in function calls.

The other need is increased robustness in the face of the many kinds of errors that can affect file system operations. We could do that at the level of each call to a Boost.Filesystem function (see Error reporting), but it is easier to supply an overall try/catch block.

tut3.cpp

int main(int argc, char* argv[])
{
  path p (argv[1]);   // p reads clearer than argv[1] in the following code

  try
  {
    if (exists(p))    // does p actually exist?
    {
      if (is_regular_file(p))        // is p a regular file?   
        cout << p << " size is " << file_size(p) << '\n';

      else if (is_directory(p))      // is p a directory?
      {
        cout << p << " is a directory containing:\n";

        copy(directory_iterator(p), directory_iterator(), // directory_iterator::value_type
          ostream_iterator<directory_entry>(cout, "\n")); // is directory_entry, which is
                                                          // converted to a path by the
                                                          // path stream inserter
      }

      else
        cout << p << " exists, but is neither a regular file nor a directory\n";
    }
    else
      cout << p << " does not exist\n";
  }

  catch (const filesystem_error& ex)
  {
    cout << ex.what() << '\n';
  }

  return 0;
}

Give tut3 a try, passing it a path to a directory as a command line argument. Here is a run on a checkout of the Boost Subversion trunk, followed by a repeat of the test cases that caused exceptions on Linux and Windows:

Ubuntu Linux

Microsoft Windows

$ ./tut3 ~/boost/trunk
/home/beman/boost/trunk is a directory containing:
  /home/beman/boost/trunk/tools
  /home/beman/boost/trunk/boost-build.jam
  /home/beman/boost/trunk/dist
  /home/beman/boost/trunk/doc
  /home/beman/boost/trunk/bootstrap.sh
  /home/beman/boost/trunk/index.html
  /home/beman/boost/trunk/bootstrap.bat
  /home/beman/boost/trunk/boost.css
  /home/beman/boost/trunk/INSTALL
  /home/beman/boost/trunk/rst.css
  /home/beman/boost/trunk/boost
  /home/beman/boost/trunk/people
  /home/beman/boost/trunk/wiki
  /home/beman/boost/trunk/boost.png
  /home/beman/boost/trunk/LICENSE_1_0.txt
  /home/beman/boost/trunk/more
  /home/beman/boost/trunk/Jamroot
  /home/beman/boost/trunk/.svn
  /home/beman/boost/trunk/libs
  /home/beman/boost/trunk/index.htm
  /home/beman/boost/trunk/status
  /home/beman/boost/trunk/CMakeLists.txt

>tut3 c:\boost\trunk
c:\boost\trunk is a directory containing:
   c:\boost\trunk\.svn
   c:\boost\trunk\boost
   c:\boost\trunk\boost-build.jam
   c:\boost\trunk\boost.css
   c:\boost\trunk\boost.png
   c:\boost\trunk\bootstrap.bat
   c:\boost\trunk\bootstrap.sh
   c:\boost\trunk\CMakeLists.txt
   c:\boost\trunk\dist
   c:\boost\trunk\doc
   c:\boost\trunk\index.htm
   c:\boost\trunk\index.html
   c:\boost\trunk\INSTALL
   c:\boost\trunk\Jamroot
   c:\boost\trunk\libs
   c:\boost\trunk\LICENSE_1_0.txt
   c:\boost\trunk\more
   c:\boost\trunk\people
   c:\boost\trunk\rst.css
   c:\boost\trunk\status
   c:\boost\trunk\tools
   c:\boost\trunk\wiki

>tut3 e:\
boost::filesystem::status: The device is not ready: "e:\"

Not bad, but we can make further improvements:

The listing would be much easier to read if only the filename was displayed, rather than the full path.
The Linux listing isn't sorted. That's because the ordering of directory iteration is unspecified. Ordering depends on the underlying operating system API and file system specifics. So we need to sort the results ourselves.

Move on to tut4.cpp to see how those changes play out!

Using path decomposition, plus sorting results - (tut4.cpp)

tut4.cpp

int main(int argc, char* argv[])
{
  path p (argv[1]);   // p reads clearer than argv[1] in the following code

  try
  {
    if (exists(p))    // does p actually exist?
    {
      if (is_regular_file(p))        // is p a regular file?   
        cout << p << " size is " << file_size(p) << '\n';

      else if (is_directory(p))      // is p a directory?
      {
        cout << p << " is a directory containing:\n";

        typedef vector<path> vec;             // store paths,
        vec v;                                // so we can sort them later

        copy(directory_iterator(p), directory_iterator(), back_inserter(v));

        sort(v.begin(), v.end());             // sort, since directory iteration
                                              // is not ordered on some file systems
  
        for (vec::const_iterator it (v.begin()); it != v.end(); ++it)
        {
          cout << "   " << *it << '\n';
        }
      }

      else
        cout << p << " exists, but is neither a regular file nor a directory\n";
    }
    else
      cout << p << " does not exist\n";
  }

  catch (const filesystem_error& ex)
  {
    cout << ex.what() << '\n';
  }

  return 0;
}

The key difference between tut3.cpp and tut4.cpp is what happens in the directory iteration loop. We changed:

cout << " " << *it << '\n';   // *it returns a directory_entry,

to:

path fn = it->path().filename();   // extract the filename from the path
v.push_back(fn);                   // push into vector for later sorting

path() is a directory_entry observer function. filename() is one of several path decomposition functions. It extracts the filename portion ("index.html") from a path ("/home/beman/boost/trunk/index.html"). These decomposition functions are more fully explored in the Path iterators, observers, composition, decomposition and query portion of this tutorial.

The above was written as two lines of code for clarity. It could have been written more concisely as:

v.push_back(it->path().filename()); // we only care about the filename

Here is the output from a test of tut4.cpp:

Ubuntu Linux

Microsoft Windows

$ ./tut4 ~/boost/trunk
/home/beman/boost/trunk is a directory containing:
  .svn
  CMakeLists.txt
  INSTALL
  Jamroot
  LICENSE_1_0.txt
  boost
  boost-build.jam
  boost.css
  boost.png
  bootstrap.bat
  bootstrap.sh
  doc
  index.htm
  index.html
  libs
  more
  people
  rst.css
  status
  tools
  wiki

C:\v3d>tut4 c:\boost\trunk
c:\boost\trunk is a directory containing:
  .svn
  CMakeLists.txt
  INSTALL
  Jamroot
  LICENSE_1_0.txt
  boost
  boost-build.jam
  boost.css
  boost.png
  bootstrap.bat
  bootstrap.sh
  doc
  index.htm
  index.html
  libs
  more
  people
  rst.css
  status
  tools
  wiki

That completes the main portion of this tutorial. If you haven't already worked through the Class path sections of this tutorial, dig into them now. The Error reporting section may also be of interest, although it can be skipped unless you are deeply concerned about error handling issues.

Class path: Constructors, including Unicode - (tut5.cpp)

Traditional C interfaces pass paths as const char* arguments. C++ interfaces may add const std::string& overloads, but adding overloads becomes untenable if wide characters, containers, and iterator ranges need to be supported.

Passing paths as const path& arguments is far simpler, yet far more flexible because class path itself is far more flexible:

Class path supports multiple character types and encodings, including Unicode, to ease internationalization.
Class path supports multiple source types, such as iterators for null terminated sequences, iterator ranges, containers (including std::basic_string), and directory_entry's, so functions taking paths don't need to provide several overloads.
Class path supports both native and generic pathname formats, so programs can be portable between operating systems yet use native formats where desirable.
Class path supplies a full set of iterators, observers, composition, decomposition, and query functions, making pathname manipulations easy, convenient, reliable, and portable.

Here is how (1) and (2) work. Class path constructors, assignments, and appends have member templates for sources. For example, here are the constructors that take sources:

template <class Source>
  path(Source const& source);

template <class InputIterator>
  path(InputIterator begin, InputIterator end);

Let's look at a little program that shows how comfortable class path is with both narrow and wide characters in C-style strings, C++ strings, and via C++ iterators:

tut5.cpp

#include <boost/filesystem.hpp>
#include <string>
#include <list>
namespace fs = boost::filesystem;

int main()
{
  // \u263A is "Unicode WHITE SMILING FACE = have a nice day!"
  std::string narrow_string ("smile2");
  std::wstring wide_string (L"smile2\u263A");
  std::list<char> narrow_list;
  narrow_list.push_back('s');
  narrow_list.push_back('m');
  narrow_list.push_back('i');
  narrow_list.push_back('l');
  narrow_list.push_back('e');
  narrow_list.push_back('3');
  std::list<wchar_t> wide_list;
  wide_list.push_back(L's');
  wide_list.push_back(L'm');
  wide_list.push_back(L'i');
  wide_list.push_back(L'l');
  wide_list.push_back(L'e');
  wide_list.push_back(L'3');
  wide_list.push_back(L'\u263A');

  { fs::ofstream f("smile"); }
  { fs::ofstream f(L"smile\u263A"); }
  { fs::ofstream f(narrow_string); }
  { fs::ofstream f(wide_string); }
  { fs::ofstream f(narrow_list); }
  { fs::ofstream f(wide_list); }
  narrow_list.pop_back();
  narrow_list.push_back('4');
  wide_list.pop_back();
  wide_list.pop_back();
  wide_list.push_back(L'4');
  wide_list.push_back(L'\u263A');
  { fs::ofstream f(fs::path(narrow_list.begin(), narrow_list.end())); }
  { fs::ofstream f(fs::path(wide_list.begin(), wide_list.end())); }

  return 0;
}

Testing tut5:

Ubuntu Linux	Microsoft Windows
$ ./tut5 $ ls smile* smile smile☺ smile2 smile2☺ smile3 smile3☺ smile4 smile4☺	>tut5 >dir /b smile* smile smile2 smile2☺ smile3 smile3☺ smile4 smile4☺ smile☺

Note that the exact appearance of the smiling face will depend on the font, font size, and other settings for your command line window. The above tests were run with out-of-the-box Ubuntu 9.10 and Windows 7, US Edition. If you don't get the above results, take a look at the boost-root/libs/filesystem/example/test directory with your system's GUI file browser, such as Linux Nautilus, Mac OS X Finder, or Windows Explorer. These tend to be more comfortable with international character sets than command line interpreters.

Class path takes care of whatever character type or encoding conversions are required by the particular operating system. Thus as tut5 demonstrates, it's no problem to pass a wide character string to a Boost.Filesystem operational function even if the underlying operating system uses narrow characters, and visa versa. And the same applies to user supplied functions that take const path& arguments.

Class path also provides path syntax that is portable across operating systems, element iterators, and observer, composition, decomposition, and query functions to manipulate the elements of a path. The next section of this tutorial deals with path syntax.

Class path: Generic format vs. Native format

Class path deals with two different pathname formats - generic format and native format. For POSIX-like file systems, these formats are the same. But for users of Windows and other non-POSIX file systems, the distinction is important. Even programmers writing for POSIX-like systems need to understand the distinction if they want their code to be portable to non-POSIX systems.

The generic format is the familiar /my_directory/my_file.txt format used by POSIX-like operating systems such as the Unix variants, Linux, and Mac OS X. Windows also recognizes the generic format, and it is the basis for the familiar Internet URL format. The directory separator character is always one or more slash characters.

The native format is the format as defined by the particular operating system. For Windows, either the slash or the backslash can be used as the directory separator character, so /my_directory\my_file.txt would work fine. Of course, if you write that in a C++ string literal, it becomes "/my_directory\\my_file.txt".

If a drive specifier or a backslash appears in a pathname on a Windows system, it is always treated as the native format.

Class path has observer functions that allow you to obtain the string representation of a path object in either the native format or the generic format. See the next section for how that plays out.

The distinction between generic format and native format is important when communicating with native C-style API's and with users. Both tend to expect paths in the native format and may be confused by the generic format. The generic format is great, however, for writing portable programs that work regardless of operating system.

The next section covers class path observers, composition, decomposition, query, and iteration over the elements of a path.

Class path: Iterators, observers, composition, decomposition, and query - (path_info.cpp)

The path_info.cpp program is handy for learning how class path iterators, observers, composition, decomposition, and query functions work on your system. If it hasn't already already been built on your system, please build it now. Run the examples below on your system, and try some different path arguments as we go along.

path_info produces several dozen output lines every time it's invoked. We will only show the output lines we are interested in at each step.

First we'll look at iteration over the elements of a path, and then use iteration to illustrate the difference between generic and native format paths.

Ubuntu Linux	Microsoft Windows
$ ./path_info /foo/bar/baa.txt ... elements: / foo bar baa.txt	>path_info /foo/bar/baa.txt ... elements: / foo bar baa.txt

Thus on both POSIX and Windows based systems the path "/foo/bar/baa.txt" is seen as having four elements.

Here is the code that produced the above listing:

cout << "\nelements:\n";

for (path::iterator it = p.begin(); it != p.end(); ++it)
  cout << " " << *it << '\n';

path::iterator::value_type is path::string_type, and iteration treats path as a container of filenames.

Let's look at some of the output from a slightly different example:

Ubuntu Linux

Microsoft Windows

$ ./path_info /foo/bar/baa.txt

composed path:
  cout << -------------: /foo/bar/baa.txt
  preferred()----------: /foo/bar/baa.txt
...
observers, native format:
  native()-------------: /foo/bar/baa.txt
  c_str()--------------: /foo/bar/baa.txt
  string()-------------: /foo/bar/baa.txt
  wstring()------------: /foo/bar/baa.txt

observers, generic format:
  generic_string()-----: /foo/bar/baa.txt
  generic_wstring()----: /foo/bar/baa.txt

>path_info /foo/bar\baa.txt

composed path:
  cout << -------------: /foo/bar/baa.txt
  preferred()----------: \foo\bar\baa.txt
...
observers, native format:
  native()-------------: /foo/bar\baa.txt
  c_str()--------------: /foo/bar\baa.txt
  string()-------------: /foo/bar\baa.txt
  wstring()------------: /foo/bar\baa.txt

observers, generic format:
  generic_string()-----: /foo/bar/baa.txt
  generic_wstring()----: /foo/bar/baa.txt

Native format observers should be used when interacting with the operating system or with users; that's what they expect.

Generic format observers should be used when the results need to be portable and uniform regardless of the operating system.

path objects always hold pathnames in the native format, but otherwise leave them unchanged from their source. The preferred() function will convert to the preferred form, if the native format has several forms. Thus on Windows, it will convert slashes to backslashes.

Let's move on to decomposition and query functions:

Ubuntu Linux

Microsoft Windows

$ ./path_info /foo/bar/baa.txt
...
decomposition:
  root_name()----------:
  root_directory()-----: /
  root_path()----------: /
  relative_path()------: foo/bar/baa.txt
  parent_path()--------: /foo/bar
  filename()-----------: baa.txt
  stem()---------------: baa
  extension()----------: .txt

query:
  empty()--------------: false
  is_absolute()--------: true
  has_root_name()------: false
  has_root_directory()-: true
  has_root_path()------: true
  has_relative_path()--: true
  has_parent_path()----: true
  has_filename()-------: true
  has_stem()-----------: true
  has_extension()------: true

>path_info /foo/bar/baa.txt
...
decomposition:
  root_name()----------:
  root_directory()-----: /
  root_path()----------: /
  relative_path()------: foo/bar/baa.txt
  parent_path()--------: /foo/bar
  filename()-----------: baa.txt
  stem()---------------: baa
  extension()----------: .txt

query:
  empty()--------------: false
  is_absolute()--------: false
  has_root_name()------: false
  has_root_directory()-: true
  has_root_path()------: true
  has_relative_path()--: true
  has_parent_path()----: true
  has_filename()-------: true
  has_stem()-----------: true
  has_extension()------: true

These are pretty self-evident, but do note the difference in the result of is_absolute() between Linux and Windows. Because there is no root name (i.e. drive specifier or network name), a lone slash (or backslash) is a relative path on Windows.

On to composition!

Class path uses / and /= operators to append elements. That's a reminder that these operations append the operating system's preferred directory separator if needed. The preferred directory separator is a slash on POSIX-like systems, and a backslash on Windows-like systems.

path_info.cpp composes a path by appending each of the command line elements to an initially empty path:

path p;  // compose a path from the command line arguments

for (; argc > 1; --argc, ++argv)
  p /= argv[1];

cout << "\ncomposed path:\n";
cout << " cout << -------------: " << p << "\n";
cout << " preferred()----------: " << p.preferred() << "\n";

Let's give this code a try:

Ubuntu Linux

Microsoft Windows

$ ./path_info / foo/bar baa.txt

composed path:
  cout << -------------: /foo/bar/baa.txt
  preferred()----------: /foo/bar/baa.txt

>path_info / foo/bar baa.txt

composed path:
  cout << -------------: /foo/bar\baa.txt
  preferred()----------: \foo\bar\baa.txt

Error reporting

The Boost.Filesystem file_size function has two overloads:

uintmax_t file_size(const path& p);
uintmax_t file_size(const path& p, system::error_code& ec);

The only significant difference between the two is how they report errors.

The first signature will throw exceptions to report errors. A filesystem_error exception will be thrown on an operational error. filesystem_error is derived from std::runtime_error. It has a member function to obtain the error_code reported by the source of the error. It also has member functions to obtain the path or paths that caused the error.

Motivation for the second signature: Throwing exceptions on errors was the entire error reporting story for the earliest versions of Boost.Filesystem, and indeed throwing exceptions on errors works very well for many applications. But user reports trickled in that some code became so littered with try and catch blocks as to be unreadable and unmaintainable. In some applications I/O errors aren't exceptional, and that's the use case for the second signature.

Functions with a system::error_code& argument set that argument to report operational error status, and so do not throw exceptions when I/O related errors occur. For a full explanation, see Error reporting in the reference documentation.

Distributed under the Boost Software License, Version 1.0. See www.boost.org/LICENSE_1_0.txt

Revised 20 February 2011