Design choices for writing efficient numerical solvers in c++: Type punning

Question

I'm writing a numerical fluid solver in C++ as a hobby project. I will try to explain what I want to accomplish in a simplified manner.

The solver has multiple flow variables (density, velocity, pressure, etc.) stored in each cell in a grid. I would want a convenient way to access the variables and doing computations on them (typically with operator overloading). They are now stored as double* array of size N, where each flow variable belonging to the same cell are stored consecutively as this: density0, u0, v0, w0, pressure0, density1, u1, v1, w1, pressure1 ... density_N-1, u_N-1, v_N-1, w_N-1, pressure_N-1

Keep in mind that I would like to keep everything general; in this specific case there were 5 flow variables, but there might also be a different amount.

What I would ideally like is to have a way to reinterpret my flow variables as a single cell variable without having to copy the memory. In this case the variable in a cell could for instance be a struct like this:

    struct FlowVar{
        double density, u, v, w, p;
    };

I know that there is something called "type-punning" which would allow you to reinterpret memory as a different type. This little example illustrates how the flow variable in cell 10 could be accessed this way:

    double* raw_data = new double[100]; 

    for (int i{0};i<100;i++) raw_data[i] = i;

    FlowVar* flow_var_10 = (FlowVar*)&raw_data[9];

Even though I got the correct variables when running this (9,10,11,12,13) , this is apparently undefined behaviour in C++ https://adriann.github.io/undefined_behavior.html

I have heard about something called std::bit_cast, but my impression is that is can't be used for my kind of purpose. However, please inform me if I'm wrong here.

So at this point I had no defined way to accomplish what I wanted. The next possible solution I checked out was to use the linear algebra library Eigen. I would then use a Eigen::Vector<double, 5> to represent a flow variable. Using Eigen is also convenient in its own right, since it has lots of useful linalg functionality. However, I am not really sure if Eigen is slower or faster than homemade matrix/vector classes for small sizes, so it might be a bad decision Is Eigen slow at multiplying small matrices? .

Eigen has a functionality called Map which allows mapping raw data to vector or matrix types without copying. I'm not sure how this is achieved in a defined and safe way, but I guess it is beyond the level of the average C++ enthusiast.

To map the raw data to a flow variable I could now do something like this:

    using Vec5 = Eigen::Vector<double,5>;
    using FlowVar = Eigen::Map<Vec5>;
    
    double* raw_data = new double[100];

    for (int i{0};i<100;i++) raw_data[i] = i;

    FlowVar flow_var = FlowVar(raw_data + 9);

Now FlowVar shares some of the memory with raw_data, in effect accomplishing the same purpose as the above type punning.

However I fear that this solution might be inefficient as I'm using small vectors and have many grid points and will need to create Maps often. The size of a Eigen::Map (at least on my computer) is 16 bytes, which is more than for instance references and pointers.

I would like some opinions on what design decision would likely be the best here. Where I stand now I have four options:

1: Use the undefined type punning - which seems to work fine for doubles in my case...

2: Use the Eigen::Map solution

3: Simply copy the data to a struct or Eigen::Vector when wanting or needing to view the raw_data as a FlowVar

4: Simply drop the entire FlowVar type and only access the raw_data directly

I would be grateful for some opinions here. Should I pick one of my four options, or are there other possibilities that I'm not aware of?

Why don't you just create an array of FlowVar directly? You could simply fill it like FlowVar data[64]; size_t n = 0; for(auto& d : data) { d.density = n++; d.u = n++; d. [...] } — Aconcagua
– Aconcagua, Commented Jun 10, 2023 at 10:55
You could as well provide a constructor for your FlowVar type. — Aconcagua
– Aconcagua, Commented Jun 10, 2023 at 10:58
@Aconcagua I actually started with this kind of design, I created something like: template<size_t N_VARS> struct FlowVars{ double variables [N_VARS]; };to be able to solve different kinds of equations. However, I found this solution inconvenient for various reasons. I found it easier to design the class structure if the FlowVar type of the raw_data don't have to be specified at compile time. — ander
– ander, Commented Jun 10, 2023 at 11:20
How about a "view": struct FlowVarView{ double* data; double& density() { return data[0]; } /* const version and similar for u, v, w, p */ };? — Jarod42
– Jarod42, Commented Jun 10, 2023 at 11:29
Flexible design and uniquely identifiable variable names contradict each other somehow... If you want to have more variables (dimensions?), how would you want to get additional names for these??? — Aconcagua
– Aconcagua, Commented Jun 10, 2023 at 21:18

Homer512 · Accepted Answer · 2023-06-10 16:02:17Z

0

To continue some aspects from the comments in a larger text:

At least when I evaluate for instance sizeof(Eigen::Matrix<Eigen::Vector<double,5>>) I get 16, but it would indeed make more sense that it has the same size as a reference.

Pretty sure you meant to write Eigen::Map<Eigen::Vector<...>>. But yes, that type functionally only needs 8 byte for a single pointer. I have yet to read enough of the code to understand where the second member comes from. If you change the Map to something that needs a second member to store the runtime size, e.g. Map<Vector<double, Eigen::Dynamic>> its size becomes 24. But whatever, if you just use Maps as local variables or the occasional class member, it doesn't really matter.

Not sure if I could create such a construct with eigen directly as it would be a vector of matrices with dimensions N_CELLS x N_EQUATIONS x 3

That is indeed a limitation of Eigen. If you want to look for alternatives that can handle multidimensional data, the keyword is "tensor". There is an unsupported tensor extension for Eigen. Other libraries might include PyTorch's C++ frontend but I cannot vouch for the quality of either library. Personally I just flatten the outer dimensions into extra columns or rows, as appropriate, but that's not really clean.

I added a member function in my solution data class that can return the flow variable in cell i as an eigen map

You mean something like this?

using Vector5d = Eigen::Vector<double, 5>;
struct FlowVar{
    double density, u, v, w, p;

    Eigen::Map<Vector5d> as_vector() noexcept
    { return Eigen::Map<Vector5d>(&u); }
};

I'm not sure this doesn't violate strict aliasing rules. However, I'm not a language lawyer and could be wrong about this. I suggest this alternative that definitely doesn't have that issue:

struct FlowVar{
    Vector5d as_vector;

    double& density() noexcept
    { return as_vector.x(); }

    double& u() noexcept
    { return as_vector.y(); }

    double& v() noexcept
    { return as_vector.z(); }

    double& w() noexcept
    { return as_vector.w(); }

    double& p() noexcept
    { return as_vector[4]; }
};

Similar to the accessors used by std::complex. With an optimizing compiler this should have zero overhead.

answered Jun 10, 2023 at 16:02

Homer512

15.1k2 gold badges16 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

chtz Over a year ago

Re: "I have yet to read enough of the code to understand where the second member comes from." There are two members for the dimension, which are empty classes if the size is fixed, but the objects still need 1 byte each, so with padding this requires 8 (pointer) + 2 (sizes) + 6 (padding) = 16 bytes

Aconcagua Over a year ago

@chtz If two sizes need just two bytes sizes would be limited maximally 255 – wouldn't it be more meaningful to have four byte wide integers instead?

chtz Over a year ago

@Aconcagua If the size is known at compile time, it should actually need no space at all. If it is unknown, then 8 bytes are used (actually the size of ptrdiff_t so it depends on what system you are compiling).

Aconcagua Over a year ago

@chtz That's all clear – solely: 2 bytes only for the sizes and 6 for padding doesn't appear meaningful to me, if we need that space anyway then we'd rather select types that are not limited to such small ranges while at the same time wasting bytes for padding, wouldn't we?

chtz Over a year ago

@Aconcagua Eigen::Map is not intended for long-time storage, but should usually only be a temporary. And with proper inlining the unused two bytes will be optimized away anyways. Otherwise, it is even worse, if one size is dynamic and the other is fixed: 8 byte for the pointer, 8 byte for the dynamic size, 1 byte for the fixed size (never accessed) and 7 bytes padding.

|

Collectives™ on Stack Overflow

Design choices for writing efficient numerical solvers in c++: Type punning

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related