Embedding

Using the interpreter

By now you should know how to use Boost.Python to call your C++ code from Python. However, sometimes you may need to do the reverse: call Python code from the C++-side. This requires you to embed the Python interpreter into your C++ program.

Currently, Boost.Python does not directly support everything you'll need when embedding. Therefore you'll need to use the Python/C API to fill in the gaps. However, Boost.Python already makes embedding a lot easier and, in a future version, it may become unnecessary to touch the Python/C API at all. So stay tuned...

Building embedded programs

To be able to use embedding in your programs, they have to be linked to both Boost.Python's and Python's static link library.

Boost.Python's static link library comes in two variants. Both are located in Boost's /libs/python/build/bin-stage subdirectory. On Windows, the variants are called boost_python.lib (for release builds) and boost_python_debug.lib (for debugging). If you can't find the libraries, you probably haven't built Boost.Python yet. See Building and Testing on how to do this.

Python's static link library can be found in the /libs subdirectory of your Python directory. On Windows it is called pythonXY.lib where X.Y is your major Python version number.

Additionally, Python's /include subdirectory has to be added to your include path.

In a Jamfile, all the above boils down to:

    projectroot c:\projects\embedded_program ; # location of the program

    # bring in the rules for python
    SEARCH on python.jam = $(BOOST_BUILD_PATH) ;
    include python.jam ;

    exe embedded_program # name of the executable
      : #sources
         embedded_program.cpp
      : # requirements
         <find-library>boost_python <library-path>c:\boost\libs\python
      $(PYTHON_PROPERTIES)
        <library-path>$(PYTHON_LIB_PATH)
        <find-library>$(PYTHON_EMBEDDED_LIBRARY) ;

Getting started

Being able to build is nice, but there is nothing to build yet. Embedding the Python interpreter into one of your C++ programs requires these 4 steps:

#include <boost/python.hpp>
Call Py_Initialize() to start the interpreter and create the _main_ module.
Call other Python C API routines to use the interpreter.
Call Py_Finalize() to stop the interpreter and release its resources.

(Of course, there can be other C++ code between all of these steps.)

Now that we can embed the interpreter in our programs, lets see how to put it to use...

Using the interpreter

As you probably already know, objects in Python are reference-counted. Naturally, the PyObjects of the Python/C API are also reference-counted. There is a difference however. While the reference-counting is fully automatic in Python, the Python/C API requires you to do it by hand. This is messy and especially hard to get right in the presence of C++ exceptions. Fortunately Boost.Python provides the handle and object class templates to automate the process.

Reference-counting handles and objects

There are two ways in which a function in the Python/C API can return a PyObject*: as a borrowed reference or as a new reference. Which of these a function uses, is listed in that function's documentation. The two require slightely different approaches to reference-counting but both can be 'handled' by Boost.Python.

For a function returning a borrowed reference we'll have to tell the handle that the PyObject* is borrowed with the aptly named borrowed function. Two functions returning borrowed references are PyImport_AddModule and PyModule_GetDict. The former returns a reference to an already imported module, the latter retrieves a module's namespace dictionary. Let's use them to retrieve the namespace of the _main_ module:

object main_module((
    handle<>(borrowed(PyImport_AddModule("__main__")))));

object main_namespace = main_module.attr("__dict__");

For a function returning a new reference we can just create a handle out of the raw PyObject* without wrapping it in a call to borrowed. One such function that returns a new reference is PyRun_String which we'll discuss in the next section.

Handle is a class template, so why haven't we been using any template parameters?

handle has a single template parameter specifying the type of the managed object. This type is PyObject 99% of the time, so the parameter was defaulted to PyObject for convenience. Therefore we can use the shorthand handle<> instead of the longer, but equivalent, handle<PyObject>.

Running Python code

To run Python code from C++ there is a family of functions in the API starting with the PyRun prefix. You can find the full list of these functions here. They all work similarly so we will look at only one of them, namely:

PyObject* PyRun_String(char *str, int start, PyObject *globals, PyObject *locals)

PyRun_String takes the code to execute as a null-terminated (C-style) string in its str parameter. The function returns a new reference to a Python object. Which object is returned depends on the start paramater.

The start parameter is the start symbol from the Python grammar to use for interpreting the code. The possible values are:

Start symbols

Py_eval_input	for interpreting isolated expressions
Py_file_input	for interpreting sequences of statements
Py_single_input	for interpreting a single statement

When using Py_eval_input, the input string must contain a single expression and its result is returned. When using Py_file_input, the string can contain an abitrary number of statements and None is returned. Py_single_input works in the same way as Py_file_input but only accepts a single statement.

Lastly, the globals and locals parameters are Python dictionaries containing the globals and locals of the context in which to run the code. For most intents and purposes you can use the namespace dictionary of the _main_ module for both parameters.

We have already seen how to get the _main_ module's namespace so let's run some Python code in it:

object main_module((
    handle<>(borrowed(PyImport_AddModule("__main__")))));

object main_namespace = main_module.attr("__dict__");

handle<> ignored((PyRun_String(

    "hello = file('hello.txt', 'w')\n"
    "hello.write('Hello world!')\n"
    "hello.close()"

  , Py_file_input
  , main_namespace.ptr()
  , main_namespace.ptr())
));

Because the Python/C API doesn't know anything about objects, we used the object's ptr member function to retrieve the PyObject*.

This should create a file called 'hello.txt' in the current directory containing a phrase that is well-known in programming circles.

Note that we wrap the return value of PyRun_String in a (nameless) handle even though we are not interested in it. If we didn't do this, the the returned object would be kept alive unnecessarily. Unless you want to be a Dr. Frankenstein, always wrap PyObject*s in handles.

Beyond handles

It's nice that handle manages the reference counting details for us, but other than that it doesn't do much. Often we'd like to have a more useful class to manipulate Python objects. But we have already seen such a class above, and in the previous section: the aptly named object class and it's derivatives. We've already seen that they can be constructed from a handle. The following examples should further illustrate this fact:

object main_module((
     handle<>(borrowed(PyImport_AddModule("__main__")))));

object main_namespace = main_module.attr("__dict__");

handle<> ignored((PyRun_String(

    "result = 5 ** 2"

    , Py_file_input
    , main_namespace.ptr()
    , main_namespace.ptr())
));

int five_squared = extract<int>(main_namespace["result"]);

Here we create a dictionary object for the _main_ module's namespace. Then we assign 5 squared to the result variable and read this variable from the dictionary. Another way to achieve the same result is to let PyRun_String return the result directly with Py_eval_input:

object result((handle<>(
    PyRun_String("5 ** 2"
        , Py_eval_input
        , main_namespace.ptr()
        , main_namespace.ptr()))
));

int five_squared = extract<int>(result);

Note that object's member function to return the wrapped PyObject* is called ptr instead of get. This makes sense if you take into account the different functions that object and handle perform.

Exception handling

If an exception occurs in the execution of some Python code, the PyRun_String function returns a null pointer. Constructing a handle out of this null pointer throws error_already_set, so basically, the Python exception is automatically translated into a C++ exception when using handle:

try
{
    object result((handle<>(PyRun_String(
        "5/0"
      , Py_eval_input
      , main_namespace.ptr()
      , main_namespace.ptr()))
    ));

    // execution will never get here:
    int five_divided_by_zero = extract<int>(result);
}
catch(error_already_set)
{
    // handle the exception in some way
}

The error_already_set exception class doesn't carry any information in itself. To find out more about the Python exception that occurred, you need to use the exception handling functions of the Python/C API in your catch-statement. This can be as simple as calling PyErr_Print() to print the exception's traceback to the console, or comparing the type of the exception with those of the exceptions:

catch(error_already_set)
{
    if (PyErr_ExceptionMatches(PyExc_ZeroDivisionError))
    {
        // handle ZeroDivisionError specially
    }
    else
    {
        // print all other errors to stderr
        PyErr_Print();
    }
}

(To retrieve even more information from the exception you can use some of the other exception handling functions listed here.)

If you'd rather not have handle throw a C++ exception when it is constructed, you can use the allow_null function in the same way you'd use borrowed:

handle<> result((allow_null(PyRun_String(
    "5/0"
   , Py_eval_input
   , main_namespace.ptr()
   , main_namespace.ptr()))));

if (!result)
    // Python exception occurred
else
    // everything went okay, it's safe to use the result