Example of STL Numeric Algorithm

STL has many libraries and one of them is the numeric library which contains some algorithms to do numerical computation.  We present an example of how to compute the average using the “accumulate” numeric algorithm. Suppose you have a bunch of data and you want to calculate the traditional mean (or average) of the data. The data could be in a vector, list, map, hash table, array, you name it. You could write something like this:

double mean( const std::vector<double>& vec)

 

{

// calculate the sum using another function; this is good practice

// and divide by the size of the vector

// vec.size() != 0

return sum(vec) / vec.size();

}

double sum( const std::vector<double>& vec)

{

// use range-for (C++11 style code)

double result = 0.0;

for (auto i : vec)

result += i;

return result;

}

For many C++ programmers, writing the loop is more natural than calling the algorithm. The breadth of STL algorithms means that many tasks you might naturally code as loops could also be written using algorithms. Let’s take a look at another approach using the STL “accumulate” algorithm of the numeric library.

Prototype

The algorithm ‘accumulate’ is a generalization of summation. The accumulate function initializes an accumulator acc with an initial value init and then computes the sum or modifies it with some other binary operation. The former uses ‘operator+’  (acc = acc + *i) to sum up the elements, the latter uses binary function (acc = binary_op(acc,*i)) to modify it for every iterator i in the range [first, last). You must always provide an initial value init. One reason for this is so there can be well-defined value when [first,last) is an empty range.

— std::accumulate( iterator begin, iterator end, value_type init)

— std::accumulate( iterator begin, iterator end, value_type init, binaryfunction f)

As all other STL algorithms, “accumulate” is a template function that takes range as argument to specify part or all of a container. In this way the algorithms can be used to work with many different types of containers that support the STL-like interface (iterator). Below the list of headers files needed for our example:

// C++ 11 includes

 

#include <tuple>                       // std::tuple

// Stl includes

#include <vector>                      // stl container

#include <numeric>                     // std::accumulate

// boost include

#include <boost/assign/std/vector.hpp> // for ‘operator+=()’

The first include in the list is the new C++11 include for tuple. Since C++11, a tuple-like interface has been added. Tuple extend the concept of pairs to an arbitrary number of elements (at least ten elements). Next 2 include files are for STL. First, the vector to store our data for testing, second the numerical algorithm “accumulate” located in the <numeric> include. Finally the last one is for the ‘Boost Assign Library’. In our case we just want to load up a container with some small amount of data. The STL containers do not make such a task particularly easy, ‘Boost Assign’ provides set of tools for initializing STL container (vector). Now we define the binary function to compute statistics (number of elements, average, variance).

// define a typedef for clarity

 

typedef std::tuple<std::size_t/*N*/,double/*X*/,double/*X^2*/> statsVar; 

// compute statistics

statsVar computeStats( statsVar aVar, double aXi)

{

using namespace std;

// Use get<>() utility to retrieve component of a tuple

++get<0>(aVar);        // count the number of entry

get<1>(aVar)+=aXi;     // add value

get<2>(aVar)+=aXi*aXi; // square

// return the statistics compilation

return aVar;

}

The first argument of this function takes a tuple which hold all the information (accumulator) to compute the statistics and the second is the value on which we want to compute stats (iterator). Body of the function, use the getter utility from C++11 to retrieve the tuple component. According to the ‘typedef’ above, the first argument of the tuple is the element counter, which is incremented, the next argument is the iterator value which is added (to compute the average) and finally the last one is square added and return the modify tuple.  To test out stats function, we write a simple use case.

// test our statistics algorithm

 

void testStatistics()

{

using namespace std;

using namespace boost::assign; // bring ‘operator+=()’ into scope

// create a vector of data to compute statistics

vector<double> w_dataStats;

w_dataStats.reserve(10); // memory allocation (good practice)

// initialize the vector (Boost Assign Library)

w_dataStats+=1.,2.,3.,4.,5.,6.,7.,8.,9.,10.; // list-like initialization

// compute the statistics

statsVar res = accumulate( w_dataStats.begin(), w_dataStats.end(), // range

make_tuple(0u,0.,0.)/*initialization*/,computeStats/*function*/);

// print the statistics that we have computed

cout << “Number of elements is: ” << get<0>(res) << endl;

cout << “Average computed is:   ” << get<1>(res)/get<0>(res) << endl;

}

Let’s give some explanation about those last lines of code. We first initialize a vector with some values, by first calling reserve() which is a good practice to avoid resizing the vector, fill the vector by using  the boost operator ‘+=’ (list-like initialization) from ‘Boost Assign Library’. Compute the statistics by calling ‘accumulate’ with the following parameters: first 2 arguments are for the beginning and end of the range, third argument as the initial value (‘make_tuple’) and last argument the binary function ‘computeStats’. Done! Finally print some statistics by using ‘get’ utility to retrieve tuple component.

One of the first things that struck me is how small the amount of code is and without for-loop, which should be avoided whenever you can. Everything is managed by the STL numeric algorithm and it works for any container that supports STL-like interface (iterator). Calling an algorithm is usually preferable to any hand-written loop. Why? (Scott Meyers’s article “STL Algorithms vs. Hand-Written Loops, Dr. Dobbs’s October 2001”)

There are three reasons:

  • Efficiency: Algorithms are often more efficient than the loops programmers produce;
  • Correctness: Writing loops is more subject to errors than calling algorithms;
  • Maintainability: Algorithm calls often yield code that is clearer and more straightforward than the corresponding explicit loops;

This example it’s a measure of the degree of abstraction that STL is able to provide to the process of getting data from a container to the algorithm that such a complicated task can be condensed to such an apparently small amount of code.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *