Operations on dataset

As mentioned earlier, one of the major aspects of using the Unit Test Framework datasets lies in the number of operations provided for their combination.

For that purpose, three operators are provided:

joins with operator+
zips with operator^ on datasets
and grids or Cartesian products with operator*

	Tip
	All these operators are associative, which enables their combination without parenthesis. However, the precedence rule on the operators for the language still apply.

A join, denoted +, is an operation on two datasets dsa and dsb of same arity and compatible types, resulting in the concatenation of these two datasets dsa and dsb from the left to the right order of the symbol +:

dsa = (a_1, a_2, ... a_i)
dsb = (b_1, b_2, ... b_j)
dsa + dsb = (a_1, a_2, ... a_i, b_1, b_2, ... b_j)

The following properties hold:

the resulting dataset is of same arity as the operand datasets,
the size of the returned dataset is the sum of the size of the joined datasets,
the operation is associative, and it is possible to combine more than two datasets in one expression. The following joins are equivalent for any datasets dsa, dsb and dsc:
```
( dsa + dsb ) + dsc
== dsa + ( dsb + dsc )
== dsa + dsb + dsc
```

	Warning
	In the expression `dsa + dsb`, `dsa` and/or `dsb` can be of infinite size. The resulting dataset will have an infinite size as well. If `dsa` is infinite, the content of `dsb` will never be reached.

Example: Example of join on datasets

Code
#define BOOST_TEST_MODULE dataset_example62 #include <boost/test/included/unit_test.hpp> #include <boost/test/data/test_case.hpp> #include <boost/test/data/monomorphic.hpp> namespace data = boost::unit_test::data; int samples1[] = {1, 2}; int samples2[] = {8, 9, 10}; BOOST_DATA_TEST_CASE( test1, data::make(samples1) + samples2, var) { std::cout << var << std::endl; }

Code

#define BOOST_TEST_MODULE dataset_example62
#include <boost/test/included/unit_test.hpp>
#include <boost/test/data/test_case.hpp>
#include <boost/test/data/monomorphic.hpp>

namespace data = boost::unit_test::data;

int samples1[] = {1, 2};
int samples2[] = {8, 9, 10};

BOOST_DATA_TEST_CASE(
      test1,
      data::make(samples1) + samples2,
      var)
{
  std::cout << var << std::endl;
}

Output
> dataset_example62 Running 5 test cases... 1 2 8 9 10 *** No errors detected

Zips

A zip, denoted ^ , is an operation on two datasets dsa and dsb of same arity and same size, resulting in a dataset where the k-th sample of dsa is paired with the corresponding k-th sample of dsb. The resulting dataset samples order follows the left to right order against the symbol ^.

dsa = (a_1, a_2, ... a_i)
dsb = (b_1, b_2, ... b_i)
dsa ^ dsb = ( (a_1, b_1), (a_2, b_2) ... (a_i, b_i) )

The following properties hold:

the arity of the resulting dataset is the sum of the arities of the operand datasets,
the size of the resulting dataset is equal to the size of the datasets (since they are supposed to be of the same size), exception made for the case the operand datasets size mismatch (see below),
the operation is associative, and it is possible to combine more than two datasets in one expression,
```
( dsa ^ dsb ) ^ dsc
== dsa ^ ( dsb ^ dsc )
== dsa ^ dsb ^ dsc
```

A particular handling is performed if dsa and dsb are of different size. The rule is as follow:

if the both zipped datasets have the same size, this is the size of the resulting dataset (this size can then be infinite).
otherwise if one of the dataset is of size 1 (singleton) or of infinite size, the resulting size is governed by the other dataset.
otherwise an exception is thrown at runtime

	Caution
	If the zip operation is not supported for your compiler, the macro `BOOST_TEST_NO_ZIP_COMPOSITION_AVAILABLE` will be automatically set by the Unit Test Framework

Example: Example of zip on datasets

Code
#define BOOST_TEST_MODULE dataset_example61 #include <boost/test/included/unit_test.hpp> #include <boost/test/data/test_case.hpp> #include <boost/test/data/monomorphic.hpp> namespace data = boost::unit_test::data; int samples1[] = {1,2}; char const* samples2[] = {"qwerty", "asdfg"}; BOOST_DATA_TEST_CASE( test1, data::make(samples1)^samples2, integer_values, string_value) { std::cout << integer_values << ", " << string_value << std::endl; }

Code

#define BOOST_TEST_MODULE dataset_example61
#include <boost/test/included/unit_test.hpp>
#include <boost/test/data/test_case.hpp>
#include <boost/test/data/monomorphic.hpp>

namespace data = boost::unit_test::data;

int samples1[] = {1,2};
char const* samples2[] = {"qwerty", "asdfg"};

BOOST_DATA_TEST_CASE(
      test1,
      data::make(samples1)^samples2,
      integer_values,
      string_value)
{
  std::cout << integer_values << ", " << string_value << std::endl;
}

Output
> dataset_example61 Running 2 test cases... 1, qwerty 2, asdfg *** No errors detected

Grid (Cartesian products)

A grid, denoted * , is an operation on two any datasets dsa and dsb resulting in a dataset where each sample of dsa is paired with each sample of dsb exactly once. The resulting dataset samples order follows the left to right order against the symbol *. The rightmost dataset samples are iterated first.

dsa = (a_1, a_2, ... a_i)
dsb = (b_1, b_2, ... b_j)
dsa * dsb = ((a_1, b_1), (a_1, b_2) ... (a_1, b_j), (a_2, b_1), ... (a_2, b_j) ... (a_i, b_1), ... (a_i, b_j))

The grid hence is similar to the mathematical notion of Cartesian product ^[3].

The following properties hold:

the arity of the resulting dataset is the sum of the arities of the operand datasets,
the size of the resulting dataset is the product of the sizes of the datasets,
the operation is associative, and it is possible to combine more than two datasets in one expression,
as for zip, there is no need the dataset to have the same type of samples.

	Caution
	If the grid operation is not supported for your compiler, the macro `BOOST_TEST_NO_GRID_COMPOSITION_AVAILABLE` will be automatically set by the Unit Test Framework

In the following example, the random number generator is the second dataset. Its state is evaluated 6 times (3 times for the first xrange - first dimension - and twice for the second xrange - second dimension - to which it is zipped). Note that the state of the random engine is not copied between two successive evaluations of the first dimension.

Example: Example of Cartesian product

Code
#define BOOST_TEST_MODULE dataset_example64 #include <boost/test/included/unit_test.hpp> #include <boost/test/data/test_case.hpp> #include <boost/test/data/monomorphic.hpp> namespace bdata = boost::unit_test::data; BOOST_DATA_TEST_CASE( test1, bdata::xrange(2) * bdata::xrange(3), xr1, xr2) { std::cout << "test 1: " << xr1 << ", " << xr2 << std::endl; BOOST_TEST((xr1 <= 2 && xr2 <= 3)); } BOOST_DATA_TEST_CASE( test2, bdata::xrange(3) * ( bdata::random( bdata::distribution=std::uniform_real_distribution<float>(1, 2)) ^ bdata::xrange(2) ), xr, random_sample, index) { std::cout << "test 2: " << xr << " / " << random_sample << ", " << index << std::endl; BOOST_TEST(random_sample < 1.7); // 30% chance of failure }

Code

#define BOOST_TEST_MODULE dataset_example64
#include <boost/test/included/unit_test.hpp>
#include <boost/test/data/test_case.hpp>
#include <boost/test/data/monomorphic.hpp>

namespace bdata = boost::unit_test::data;


BOOST_DATA_TEST_CASE(
  test1,
  bdata::xrange(2) * bdata::xrange(3),
  xr1, xr2)
{
  std::cout << "test 1: " << xr1 << ", " << xr2 << std::endl;
  BOOST_TEST((xr1 <= 2 && xr2 <= 3));
}

BOOST_DATA_TEST_CASE(
  test2,
  bdata::xrange(3)
  *
  ( bdata::random(
      bdata::distribution=std::uniform_real_distribution<float>(1, 2))
    ^ bdata::xrange(2)
  ),
  xr, random_sample, index)
{
  std::cout << "test 2: "
    << xr << " / "
    << random_sample << ", " << index
    << std::endl;
  BOOST_TEST(random_sample < 1.7); // 30% chance of failure
}

Output
> dataset_example64 Running 12 test cases... test 1: 0, 0 test 1: 0, 1 test 1: 0, 2 test 1: 1, 0 test 1: 1, 1 test 1: 1, 2 test 2: 0 / 1.00001, 0 test 2: 0 / 1.13154, 1 test 2: 1 / 1.75561, 0 test.cpp(40): error: in "test2/_2": check random_sample < 1.7 has failed [1.75560534 >= 1.7] Failure occurred in a following context: xr = 1; random_sample = 1.75560534; index = 0; test 2: 1 / 1.45865, 1 test 2: 2 / 1.53277, 0 test 2: 2 / 1.21896, 1 *** 1 failure is detected in the test module "dataset_example64"

Output

> dataset_example64
Running 12 test cases...
test 1: 0, 0
test 1: 0, 1
test 1: 0, 2
test 1: 1, 0
test 1: 1, 1
test 1: 1, 2
test 2: 0 / 1.00001, 0
test 2: 0 / 1.13154, 1
test 2: 1 / 1.75561, 0
test.cpp(40): error: in "test2/_2": check random_sample < 1.7 has failed [1.75560534 >= 1.7]
Failure occurred in a following context:
    xr = 1; random_sample = 1.75560534; index = 0;
test 2: 1 / 1.45865, 1
test 2: 2 / 1.53277, 0
test 2: 2 / 1.21896, 1

*** 1 failure is detected in the test module "dataset_example64"

^[3] if the sequence is viewed as a set

Boost C++ Libraries

Operations on dataset

Joins

Example: Example of join on datasets

Zips

Example: Example of zip on datasets

Grid (Cartesian products)

Example: Example of Cartesian product