12

Given the following dataframe, is there some way to select only columns starting with a given prefix? I know I could do e.g. pl.col(column) for column in df.columns if column.startswith("prefix_"), but I'm wondering if I can do it as part of a single expression.

df = pl.DataFrame(
    {"prefix_a": [1, 2, 3], "prefix_b": [1, 2, 3], "some_column": [3, 2, 1]}
)
df.select(pl.all().<column_name_starts_with>("prefix_"))

Would this be possible to do lazily?

3 Answers 3

11

Starting from Polars 0.18.1 you can use Selectors(polars.selectors.starts_with) which provides more intuitive selection of columns from DataFrame or LazyFrame objects based on their name, dtype or other properties.

>>> import polars as pl
>>> import polars.selectors as cs
>>> 
>>> df = pl.DataFrame(
...     {"prefix_a": [1, 2, 3], "prefix_b": [1, 2, 3], "some_column": [3, 2, 1]} 
... )
>>> df
shape: (3, 3)
┌──────────┬──────────┬─────────────┐
│ prefix_a ┆ prefix_b ┆ some_column │
│ ---      ┆ ---      ┆ ---         │
│ i64      ┆ i64      ┆ i64         │
╞══════════╪══════════╪═════════════╡
│ 1        ┆ 1        ┆ 3           │
│ 2        ┆ 2        ┆ 2           │
│ 3        ┆ 3        ┆ 1           │
└──────────┴──────────┴─────────────┘
>>> # print(df.lazy().select(cs.starts_with("prefix_")).collect()) # for LazyFrame
>>> print(df.select(cs.starts_with("prefix_"))) # For DataFrame
shape: (3, 2)
┌──────────┬──────────┐
│ prefix_a ┆ prefix_b │
│ ---      ┆ ---      │
│ i64      ┆ i64      │
╞══════════╪══════════╡
│ 1        ┆ 1        │
│ 2        ┆ 2        │
│ 3        ┆ 3        │
└──────────┴──────────┘
Sign up to request clarification or add additional context in comments.

Comments

10

From the documentation for polars.col, the expression can take one of the following arguments:

  • a single column by name

  • all columns by using a wildcard “*”

  • column by regular expression if the regex starts with ^ and ends with $

So in this case, we can use a regex expression to select for the prefix. And this does work in lazy mode.

(
    df
    .lazy()
    .select(pl.col('^prefix_.*$'))
    .collect()
)
shape: (3, 2)
┌──────────┬──────────┐
│ prefix_a ┆ prefix_b │
│ ---      ┆ ---      │
│ i64      ┆ i64      │
╞══════════╪══════════╡
│ 1        ┆ 1        │
│ 2        ┆ 2        │
│ 3        ┆ 3        │
└──────────┴──────────┘

Note: we can also use polars.exclude with regex expressions:

(
    df
    .lazy()
    .select(pl.exclude('^prefix_.*$'))
    .collect()
)
shape: (3, 1)
┌─────────────┐
│ some_column │
│ ---         │
│ i64         │
╞═════════════╡
│ 3           │
│ 2           │
│ 1           │
└─────────────┘

Comments

2

You can also use polars.selectors.matches with the pattern ^prefix_.

>>> import polars as pl
>>> import polars.selectors as cs
>>> 
>>> df = pl.DataFrame(
...     {"prefix_a": [1, 2, 3], "prefix_b": [1, 2, 3], "some_column": [3, 2, 1]}
... )
>>> 
>>> df
shape: (3, 3)
┌──────────┬──────────┬─────────────┐
│ prefix_a ┆ prefix_b ┆ some_column │
│ ---      ┆ ---      ┆ ---         │
│ i64      ┆ i64      ┆ i64         │
╞══════════╪══════════╪═════════════╡
│ 1        ┆ 1        ┆ 3           │
│ 2        ┆ 2        ┆ 2           │
│ 3        ┆ 3        ┆ 1           │
└──────────┴──────────┴─────────────┘
>>> 
>>> df.lazy().select(cs.matches("^prefix_")).collect()
shape: (3, 2)
┌──────────┬──────────┐
│ prefix_a ┆ prefix_b │
│ ---      ┆ ---      │
│ i64      ┆ i64      │
╞══════════╪══════════╡
│ 1        ┆ 1        │
│ 2        ┆ 2        │
│ 3        ┆ 3        │
└──────────┴──────────┘

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.