How to filter empty strings from a list column in a Polars Dataframe?

Question

I have a a Polars Dataframe:

import polars as pl

pl.Config(fmt_table_cell_list_len=8, fmt_str_lengths=80)
    
df = pl.DataFrame({'test_names':[['Mallesham','','Bhavik','Jagarini','Jose','Fernando'],
                                 ['','','','ABC','','XYZ']]})

I would like to get a count of elements from each list in test_names not considering the empty strings.

df.with_columns(pl.col('test_names').list.len().alias('tot_names'))

┌─────────────────────────────────────────────────────────────┬───────────┐
│ test_names                                                  ┆ tot_names │
│ ---                                                         ┆ ---       │
│ list[str]                                                   ┆ u32       │
╞═════════════════════════════════════════════════════════════╪═══════════╡
│ ["Mallesham", "", "Bhavik", "Jagarini", "Jose", "Fernando"] ┆ 6         │
│ ["", "", "", "ABC", "", "XYZ"]                              ┆ 6         │
└─────────────────────────────────────────────────────────────┴───────────┘

Here it is including empty strings in the count, but I would like to filter them out and get:

┌─────────────────────────────────────────────────────────────┬───────────┐
│ test_names                                                  ┆ tot_names │
│ ---                                                         ┆ ---       │
│ list[str]                                                   ┆ u32       │
╞═════════════════════════════════════════════════════════════╪═══════════╡
│ ["Mallesham", "", "Bhavik", "Jagarini", "Jose", "Fernando"] ┆ 5         │
│ ["", "", "", "ABC", "", "XYZ"]                              ┆ 2         │
└─────────────────────────────────────────────────────────────┴───────────┘

jqurious · Accepted Answer · 2024-09-24 02:50:49Z

You can use list.eval to run any polars expression on the list's elements. In an list.eval expression, you can pl.element() to refer to the lists element and then apply an expression.

Next we simply use a filter expression to prune the values we don't need.

df = pl.DataFrame({
    "test_names":[
        ["Mallesham","","Bhavik","Jagarini","Jose","Fernando"],
        ["","","","ABC","","XYZ"]
    ]
})

df.with_columns(
    pl.col("test_names").list.eval(pl.element().filter(pl.element() != ""))
)

shape: (2, 1)
┌─────────────────────────────────────┐
│ test_names                          │
│ ---                                 │
│ list[str]                           │
╞═════════════════════════════════════╡
│ ["Mallesham", "Bhavik", ... "Fer... │
│ ["ABC", "XYZ"]                      │
└─────────────────────────────────────┘

jqurious · Accepted Answer · 2024-10-15 15:28:11Z

1

Good question - basically we want to apply a filter within each list element.

We do this by using list.eval which allows us to do operations inside the Series on each row and use pl.element to be a proxy for the Series on each row.

(
    df_pol
    .with_columns(
        pl.col('test_names').list.eval(
            pl.element().filter(pl.element().str.len_chars()>0)
         )
         .list.len()
         .alias('tot_names')
     )
)

edited Oct 15, 2024 at 15:28

jqurious

24.2k6 gold badges24 silver badges43 bronze badges

answered Oct 28, 2022 at 8:59

braaannigan

9342 gold badges7 silver badges18 bronze badges

Collectives™ on Stack Overflow

How to filter empty strings from a list column in a Polars Dataframe?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related