Boost.Python

February 2002 Progress Report

Documentation
Overhaul of to_python/from_python conversion mechanism
Miscellaneous

Python10 Conference Report

I spent the first week of February at the Python10 conference in Alexandria, VA. I'm including this experience report for two reasons: firstly, it documents where my time was used. Secondly, a public presence for Boost.Python and interaction between the Python and C++ communities is important to the future of Boost.Python, which in turn is important to the Kull Project.

Andy Koenig, of all people, was the keynote speaker of this year's opening plenary session. He presented his "impressions of a polyglot outsider", which studiously avoided any mention of C++ until the end of his talk, when he was asked about standardization. I was surprised to learn that the C++ community at large wanted a few more years before beginning but when ANSI accepted HP's request for a standard, the process was forced to start: it was a matter of participating or having standardization proceed without one's input. Andy managed to highlight very effectively the balance of strengths in Python, one of the most important being its support for extension via libraries. In many ways that makes Python a good analogue for C++ in the interpreted world

There were several kind mentions of the Boost.Python library from people who found it indispensable. I was particularly happy that Karl MacMillan, Michael Droettboom, and Ichiro Fujinaga from Johns Hopkins is using it to do OCR on a vast library of music notation, since in a previous life I was an author of music notation software. These guys are also drawing on Ullrich Koethe's VIGRA library for image manipulation (Ullrich has been a major contributor to Boost.Python). They also have a system for writing the Boost.Python wrapper code in C++ comments, which allows them to keep all of the code in one place. I've asked them to send me some information on that.

The development of Swig has been gaining momentum again (the basic description at www.boost.org/libs/python/doc/comparisons.html still applies). The talk given about it by David Beazly was very well-attended, and they appear to have quite a few users. Swig's strengths (coverage of many langauages) and weaknesses (incomplete C++ language support) haven't changed, although the C++ support seems to have improved considerably - they now claim to have a complete model of the C++ type system. It seems to be mostly geared at wrapping what Walter Landry calls "C-Tran": C++ code which traffics in built-in types with little use of abstraction. I'm not knocking that, either: I'm sure a lot of that code exists, so it's a valuable service. One feature Swig has which I'd like to steal is the ability to unwrap a single Python argument into multiple C++ arguments, for example, by converting a Python string into a pointer and length. When his talk was over, David approached me about a possible joint workshop on language binding, which sounds like a fun idea to me.

I spent some considerable time talking with Steven Knight, the leader of the Scons build tool effort. We had a lot to share with one another, and I gained a much better appreciation for many of the Scons design decisions. Scons seems to be concentrating on being the ultimate build system substrate, and Steve seemed to think that we were on the right track with our high-level design. We both hope that the Boost.Build V2 high-level architecture can eventually be ported to run on top of Scons.

They also have a highly-refined and successful development procedure which I'd like to emulate for Boost.Build V2. Among many other things they do, their source-control system automatically ensures that when you check in a new test, it is automatically run on the currently checked-in state of the code, and is expected to fail -- a relatively obvious good idea which I've never heard before.

Guido Van Rossum's "State of the Python Union" address was full of questions for the community about what should be done next, but the one idea Guido seemed to stress was that core language stability and continuing library development would be a good idea (sound familiar?) I mentioned the Boost model as a counterpoint to the idea of something like CPAN (the massive Perl library archives), and it seemed to generate some significant interest. I've offered to work with anyone from the Python community who wants to set up something like Boost.

There was some discussion of "string interpolation" (variable substitution in strings), and Guido mentioned that he had some thoughts about the strengths/weaknesses of Python's formatting interface. It might be useful for those working on formatting for boost to contact him and find out what he has to say.

Ka-Ping Yee demoed a Mailman discussion thread weaver. This tool weaves the various messages in a discussion thread into a single document so you can follow the entire conversation. Since we're looking very seriously at moving Boost to Mailman, this could be a really useful thing for us to have. If we do this, we'll move the yahoogroups discussions into the mailman archive so old discussions can be easily accessed in the same fashion.

And, just because it's cool, though perhaps not relevant: http://homepages.ulb.ac.be/~arigo/psyco/ is a promising effort to accelerate the execution of Python code to speeds approaching those of compiled languages. It reminded me a lot of Todd Veldhuizen's research into moving parts of C++ template compilation to runtime, only coming from the opposite end of things.

Boost.Python v2 Progress

Here's what actually got accomplished.

Documentation

My first priority upon returning from Python10 was to get some documentation in place. After wasting an unfortunate amount of time looking at automatic documentation tools which don't quite work, I settled down to use Bill Kempf's HTML templates designed to be a boost standard. While they are working well, it is highly labor-intensive.

I decided to begin with the high-level reference material, as opposed to tutorial, narrative, or nitty-gritty details of the framework. It seemed more important to have a precise description of the way the commonly-used components work than to have examples in HTML (since we already have some test modules), and since the low-level details are much less-frequently needed by users it made sense for me to simply respond to support requests for the time being.

After completing approximately 60% of the high-level docs (currently checked in to libs/python/doc/v2), I found myself ready to start documenting the mechanisms for creating to-/from-python converters. This caused a dilemma: I had realized during the previous week that a much simpler, more-efficient, and easier-to-use implementation was possible, but I hadn't planned on implementing it right away, since what was already in place worked adequately. I had also received my first query on the C++-sig about how to write such a converter

Given the labor-intensive nature of documentation writing, I decided it would be a bad idea to document the conversion mechanism if I was just going to rewrite it. Often the best impetus for simplifying a design is the realization that understandably documenting its current state would be too difficult, and this was no exception.

Overhaul of `to_python`/`from_python` conversion mechanism

There were two basic realizations involved here:

to_python conversion could be a one-step process, once an appropriate conversion function is found. This allows elimination of the separate indirect convertibility check
There are basically two categories of from_python conversions: those which lvalues stored within or held by the Python object (essentially extractions), like what happens when an instance of a C++ class exposed with class_ is used as the target of a wrapped member function), and those in which a new rvalue gets created, as when a Python Float is converted to a C++ complex<double> or a Python tuple is converted to a C++ std::vector<>. From the client side, there are two corresponding categories of conversion: those which demand an lvalue conversion and those which can accept an lvalue or an rvalue conversion.

The latter realization allowed the following collapse, which considerably simplified things:

Target Type Eligible Converters
T T rvalue or lvalue
T const
T volatile
T const volatile
T const&
T const* T lvalue
T volatile*
T const volatile*
T&
T volatile&
T const volatile&
T* const&
T const* const&
T volatile*const&
T const volatile*const&

Target Type	Eligible Converters
`T`	`T` rvalue or lvalue
`T const`
`T volatile`
`T const volatile`
`T const&`
`T const*`	`T` lvalue
`T volatile*`
`T const volatile*`
`T&`
`T volatile&`
`T const volatile&`
`T* const&`
`T const* const&`
`T volatile*const&`
`T const volatile*const&`

This job included the following additional enhancements:

Elimination of virtual functions, which cause object code bloat
Registration of a single converter function for all lvalue conversions, two for all rvalue conversions
Killed lots of unneeded code
Increased opacity of registry interface
Eliminated all need for decorated runtime type identifiers
Updated test modules to reflect new interface
Eliminated the need for users to worry about converter lifetime issues Additional Builtin Conversion Enhancements
Support for complex<float>, complex<double>, and complex<long double> conversions
Support for bool conversions
NULL pointers representable by None in Python
Support for conversion of Python classic classes to numeric types

Miscellaneous

These don't fit easily under a large heading:

Support CallPolicies for class member functions
from_python_data.hpp: revamped type alignment metaprogram so that it's fast enough for KCC
classfwd.hpp header forward-declares class_<T>
indirect_traits.hpp:
added is_pointer_to_reference
fixed bugs
Reduced recompilation dependencies
msvc_typeinfo works around broken MS/Intel typeid() implementation
Many fixes and improvements to the type_traits library in order to work around compiler bugs and suppress warnings
Eliminated the need for explicit acquisition of converter registrations
Expanded constructor support to 6 arguments
Implemented generalized pointer lifetime support
Updated code generation for returning.hpp
Tracked down and fixed cycle GC bugs
Added comprehensive unit tests for destroy_reference, pointer_type_id, select_from_python, complex<T>, bool, and classic class instance conversions

Revised 13 November, 2002