3

I'm trying to replicate one of the Polars Python examples in Rust but seem to have hit a wall. In the Python docs there is an example which creates a new column with the lengths of the strings from another column. So for example, column B will contain the lengths of all the strings in column A.

The example code looks like this:

import polars as pl

df = pl.DataFrame({"shakespeare": "All that glitters is not gold".split(" ")})

df = df.with_column(pl.col("shakespeare").str.lengths().alias("letter_count")) 

As you can see it uses the str namespace to access the lengths() function but when trying the same in the Rust version this does not work:

use polars::prelude::*;

// This will throw the following error:

// no method named `lengths` found for struct `StringNameSpace` in the current scope

fn print_length_strings_in_column() -> () {
    let df = generate_df().expect("error");
    let new_df = df
        .lazy()
        .with_column(col("vendor_id").str().lengths().alias("vendor_id_length"))
        .collect();
}

Cargo.toml:

[dependencies]
polars = {version = "0.22.8", features = ["strings", "lazy"]}

I checked the docs and it seems like the Rust version of Polars does not implement the lengths() function. There is the str_lengths function in the Utf8NameSpace but it's not entirely clear to me how to use this.

I feel like I'm missing something very simple here but I don't see it. How would i go about tackling this issue?

Thanks!

2
  • Similar question going for the same result, but it was worded poorly. Also I have since figured some things out. Thought it would be more clear to remove the old one and start over with a better worded question Commented Jul 5, 2022 at 22:07
  • ah, gotcha, was just curious Commented Jul 5, 2022 at 23:28

1 Answer 1

4

You have to use apply function and cast the series to Utf8 Chunked Array. It then has a method str_lengths(): https://docs.rs/polars/0.22.8/polars/chunked_array/struct.ChunkedArray.html

let s = Series::new("vendor_id", &["Ant", "no", "how", "Ant", "mans"]);
let df = DataFrame::new(vec![s]).unwrap();
let res = df.lazy()
    .with_column(col("vendor_id").apply(|srs|{
        Ok(srs.utf8()?
            .str_lengths()
            .into_series())
    }, GetOutput::from_type(DataType::Int32))
    .alias("vendor_id_length"))
    .collect();
Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I was looking for! I found a convoluted way of doing this by changing the LazyFrame to a DataFrame and then using the columns function and what not. Thanks so much, code is lot cleaner now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.