5

I have a a Polars Dataframe:

import polars as pl

pl.Config(fmt_table_cell_list_len=8, fmt_str_lengths=80)
    
df = pl.DataFrame({'test_names':[['Mallesham','','Bhavik','Jagarini','Jose','Fernando'],
                                 ['','','','ABC','','XYZ']]})

I would like to get a count of elements from each list in test_names not considering the empty strings.

df.with_columns(pl.col('test_names').list.len().alias('tot_names'))
┌─────────────────────────────────────────────────────────────┬───────────┐
│ test_names                                                  ┆ tot_names │
│ ---                                                         ┆ ---       │
│ list[str]                                                   ┆ u32       │
╞═════════════════════════════════════════════════════════════╪═══════════╡
│ ["Mallesham", "", "Bhavik", "Jagarini", "Jose", "Fernando"] ┆ 6         │
│ ["", "", "", "ABC", "", "XYZ"]                              ┆ 6         │
└─────────────────────────────────────────────────────────────┴───────────┘

Here it is including empty strings in the count, but I would like to filter them out and get:

┌─────────────────────────────────────────────────────────────┬───────────┐
│ test_names                                                  ┆ tot_names │
│ ---                                                         ┆ ---       │
│ list[str]                                                   ┆ u32       │
╞═════════════════════════════════════════════════════════════╪═══════════╡
│ ["Mallesham", "", "Bhavik", "Jagarini", "Jose", "Fernando"] ┆ 5         │
│ ["", "", "", "ABC", "", "XYZ"]                              ┆ 2         │
└─────────────────────────────────────────────────────────────┴───────────┘

2 Answers 2

3

You can use list.eval to run any polars expression on the list's elements. In an list.eval expression, you can pl.element() to refer to the lists element and then apply an expression.

Next we simply use a filter expression to prune the values we don't need.

df = pl.DataFrame({
    "test_names":[
        ["Mallesham","","Bhavik","Jagarini","Jose","Fernando"],
        ["","","","ABC","","XYZ"]
    ]
})

df.with_columns(
    pl.col("test_names").list.eval(pl.element().filter(pl.element() != ""))
)
shape: (2, 1)
┌─────────────────────────────────────┐
│ test_names                          │
│ ---                                 │
│ list[str]                           │
╞═════════════════════════════════════╡
│ ["Mallesham", "Bhavik", ... "Fer... │
│ ["ABC", "XYZ"]                      │
└─────────────────────────────────────┘
Sign up to request clarification or add additional context in comments.

Comments

1

Good question - basically we want to apply a filter within each list element.

We do this by using list.eval which allows us to do operations inside the Series on each row and use pl.element to be a proxy for the Series on each row.

(
    df_pol
    .with_columns(
        pl.col('test_names').list.eval(
            pl.element().filter(pl.element().str.len_chars()>0)
         )
         .list.len()
         .alias('tot_names')
     )
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.