I have a DataFrame with a string column that looks like this:
let df = df!(
"names" => &["None", "0", "1", "15", "1|2", "5 ??", "293 ", "XX"]);
I want to filter this down to only rows that are a single integer (multiple digits is fine) where that integer is greater than 0. I will also want to strip leading and trailing spaces, unless parse() does that for me. (None of the numbers will be very high, nothing over 3000). In the above case, the indexes that get through the filter would be 2, 3, and 6.
I've found this other answer, but it doesn't quite have what I need. The filtering page of the Polars user guide only shows very simple cases. Perhaps I haven't found the right page of the docs?
This successfully removes all the "0"s, but I don't want to have to exclude things one by one:
let filtered = df
.lazy()
.filter(col("names").neq(lit("0")))
.collect()?;
println!("filtered: {}", filtered);
Thanks in advance!
Update: It looks like I want to create a new column by casting the string to an integer column, presumably also using CastOptions::NonStrict. But I can't figure out how to do that… When I try to use the CastOptions enum, the compiler complains that it's private?? Also, I'm getting the error "no method named cast_with_options found for enum Expr in the current scope" on the call following col("names"), but that's exactly what the docs do with the plain cast(), so I'm really confused now. The below DOES NOT WORK yet.
use polars::chunked_array::cast::CastOptions; // Error
// ...
let out = df
.clone()
.lazy()
.select([col("names")
.cast_with_options(DataType::UInt16, CastOptions::NonStrict) // Error
.alias("int_names")])
.collect()?;
println!("post-cast: {}", out);
Update 2: Here are the full errors:
1 error[E0603]: struct `CastOptions` is private
--> rust/orphaned_splits.rs:5:34
|
5 | use polars::chunked_array::cast::CastOptions;
| ^^^^^^^^^^^ private struct
|
note: the struct `CastOptions` is defined here
--> /Users/nick/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.38.3/src/chunked_array/cast.rs:3:5
|
3 | use arrow::compute::cast::CastOptions;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: import `CastOptions` directly
|
5 | use polars_arrow::compute::cast::CastOptions;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 error[E0599]: no method named `cast_with_options` found for enum `Expr` in the current scope
--> rust/orphaned_splits.rs:151:14
|
150 | .select([col("names")
| __________________-
151 | | .cast_with_options(DataType::UInt16, CastOptions::NonStrict)
| |_____________-^^^^^^^^^^^^^^^^^
|
help: there is a method `over_with_options` with a similar name
|
151 | .over_with_options(DataType::UInt16, CastOptions::NonStrict)
| ~~~~~~~~~~~~~~~~~
Update 3: Here's the dependency:
polars = { version = "0.38.3", features = ["parquet", "lazy", "dtype-categorical", "dtype-i16"] }
cargo check.