1

I'm trying to calculate the covariance of a data frame in Rust. The ndarray_stats crate defines such a function for arrays, and I can produce an array from a DataFrame using to_ndarray. The compiler is happy if I use the example in the documentation (a), but if I try to use it on an Array2 produced from a DataFrame, this doesn't work:

use polars::prelude::*;
use ndarray_stats::CorrelationExt;

fn cov(df: &DataFrame) -> Vec<f64> {
    // Both of these are Array2<f64>s
    let mat = df.to_ndarray::<Float64Type>().unwrap();
    let a = arr2(&[[1., 3., 5.], [2., 4., 6.]]);

    let x = a.cov(1.).unwrap();
    let y = mat.cov(1.).unwrap();
}
   |
22 |     let y = mat.cov(1.).unwrap();
   |                 ^^^ method not found in `ndarray::ArrayBase<ndarray::data_repr::OwnedRepr<f64>, ndarray::dimension::dim::Dim<[usize; 2]>>`

Why does the compiler allow the definition of x but not y? How can I fix the code such that y can be assigned?

1 Answer 1

2

It is a dependency version mismatch.polars-core depends on ndarray version 0.13.x as of 0.14.7, whereas ndarray-stats 0.5 requires ndarray 0.15. As you use the latest version of ndarray in your project as well, the 2D array type of x will be compatible with the extension trait CovExt provided by ndarray-stats, but y will not.

Regardless of the nature of a type in a library, once multiple semver-incompatible versions of a library are included, their types will typically not be interchangeable. In other words, even though these Array2<_> may appear to be the same type, they are treated as different types by the compiler.

The multiple versions of a crate in a package can be found by inspecting the output of cargo tree -d, which shows only duplicate dependencies and the reverse tree that shows the crates depending on them. Duplicates do not necessarily pose a problem, but problems arise if the project consumes more than one API directly.

The lowest common denominator at the time of writing is to downgrade ndarray to 0.13 and ndarray-stats to 0.3, which also has the method cov. It may also be worth looking into contributing to the polars project in order to update ndarray there.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for this comprehensive answer. Is there any way to know how to when a dependency clash is is happening, and avoid it?
Would it be an option for the dependent (polars) to re-export their version of ndarray, so that consumers can always rely on that version instead of declaring their own import?
I don't think that would help, because at no point am I needing to import ndarray here. I'm obtaining an ndarray from DataFrame.to_ndarray() and "passing it into" CorrelationExt.cov. If there is a version mismatch that can't ever work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.