Lessons Learned from a Scientific Code Refactoring
Scientific code refactoring using Object-Oriented Numeric
I have been working on the refactoring of one of our numerical libraries using Object-Oriented Numeric methodology. I learned a lot about developing architecture to support a flexible programming environment. I will discuss some of the lessons I learned over the years working on this project.
“Object-Oriented Programming” and “Object-Oriented Numerics” are overlapping but not identical terms. OON encourages computer implementation of mathematical abstraction.
Scientific code has several properties which set it apart from code written for standard, non-scientific applications, it is written to investigate complex physics phenomena.
Lesson #1 Don’t try to be over generic
Do not try to be too generic at the starting phase of the project. By doing this, you lose the essence of object-oriented programming. Each data type is responsible for doing something.
I was focusing on concepts globally but forgot that the basis of Object-Oriented is to create data types that model the problem. I was working backward (end towards the beginning).
The first phase of the development should start with identifying and grouping local entities i.e., making useful abstractions of the problem at hand. In the context of numeric, the logical entities are often readily available in the form of mathematical abstractions.
Use Case (gather requirements)
I’ve been working on many software projects and one of the mistakes that everyone makes is to bypass requirements phase. This is probably the most important and will have a huge impact on the design of your software. “Who is going to use it, and how they will be using it” that’s the question that you should ask to yourself.
How do you express use case? UML diagram or developing a GUI (Graphical User Interface) forces you to think about services or functionalities to be supported by your application.
Lesson #2 “Good things come in small package”
The keyword is to name things!!! Remember you don’t write code for yourself, but for other people who will look at it, or use it they need to understand at glance.
At the beginning don’t be shy to add types, as you go along you will find out that some do not make sense (throw them in the garbage), others you will need to be merged, and others need to be refactored. In any case, the key goal is to extract the base types of the environment, in math programming its easy because abstractions are built-in (math equations, physics algorithm and so on). I found out by doing this my code is more manageable and easier to make changes, reduce dependencies across class (methods), avoid passing arguments to methods that are not required.
Lesson #3 “Easy to use correctly, hard to use incorrectly” S. Meyers (More Effective C++)
Think-Out-Of-The-Box!!!
A major error we make when we re-factor, we encapsulate (wrapping) functionalities in class without any concept behind that type. Do I gain flexibility? Not at all, since I have exactly or almost same thing, except now its wrapped in some class.
Data type is close to the problem we want to model. For example, in numerical simulation we deal with arrays of numbers, but behind those arrays of numbers there is a concept of a field. Numerical analysis involves the study of problems by computing their numerical values (transformation of the physical equations into algebraic systems of equations).
Concept of a scalar field over the grid that model numerical values. When a scalar field is passed as an argument of a method or function, I can query its values, but also the grid info, such as number of points of the grid, and so on. Passing an array doesn’t tell me much, except it’s a container of values.
Below is a function that calculates the time step value based on the CFL (Courant-Friedrich-Levy) criteria. We pass 2 scalar field as argument. Retrieve the values of the scalar field, but we also need the grid spacing and the number of grid nodes.
/** Provide support for those 3 version of the scalarField
* no choice since we maintain these until we complete
* the merge. Might still need it after merge completed.
*/
template<typename scalarField,
typename std::enable_if<std::is_pointer<scalarField>::value>::type*>
auto TimeStepCondition( const scalarField aU1, const scalarField aU2)
{
if constexpr ( !std::is_same_v<scalarField, jb::scalarField*> &&
!std::is_same_v<scalarField, Dsn::scalarField*> &&
!std::is_same_v<scalarField, Sfx::FieldLattice*>)
{
return std::make_optional(std::nullopt);
}
auto dt = std::numeric_limits<float64>::max();
// start j=2, j=1 is tied node (computational domain)
for( auto j = 2; j <= aU1->grid().getNoPoints(); j++)
{
const auto V = aU2->values()(j) / aU1->values()(j); // velocity
const auto c = std::sqrt(cGravity<float64>*aU1->values()(j)); // unit width
auto dtc = aU1->grid().Delta(1) / (::fabs(V) + c); // time step at i
dt = (std::min)(dtc, dt); // time step
}
//return dt;
return std::make_optional(dt);
}
Leave a Reply
Want to join the discussion?Feel free to contribute!