In pandas we have the pandas.DataFrame.select_dtypes method that selects certain columns depending on the dtype. Is there a similar way to do such a thing in Polars?
3 Answers
One can pass data type(s) to pl.col:
import polars as pl
df = pl.DataFrame(
{
"id": [1, 2, 3],
"name": ["John", "Jane", "Jake"],
"else": [10.0, 20.0, 30.0],
}
)
print(df.select(pl.col(pl.String, pl.Int64)))
Output:
shape: (3, 2)
┌─────┬──────┐
│ id ┆ name │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ John │
│ 2 ┆ Jane │
│ 3 ┆ Jake │
└─────┴──────┘
3 Comments
Ludecan
Adding to the discussion, you can use
df.select(pl.col(pl.NUMERIC_DTYPES)) to select all numeric columns. I'm looking for the way to select non numeric columns now.raui100
@Ludecan you can use
all and exclude to select all non numeric columns from a DataFrame df: df.select(pl.all().exclude(pl.NUMERIC_DTYPES))Samuel Allain
as of today (1.9.0),
pl.NUMERIC_DTYPES is deprecated. An alternative is import polars.selectors as cs ; df.select(cs.numeric())Starting from Polars 0.18.1 You can use polars.selectors.by_dtype selector to select all columns matching the given dtypes.
>>> import polars as pl
>>> import polars.selectors as cs
>>>
>>> df = pl.DataFrame(
... {
... "id": [1, 2, 3],
... "name": ["John", "Jane", "Jake"],
... "else": [10.0, 20.0, 30.0],
... }
... )
>>>
>>> print(df.select(cs.by_dtype(pl.String, pl.Int64)))
shape: (3, 2)
┌─────┬──────┐
│ id ┆ name │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ John │
│ 2 ┆ Jane │
│ 3 ┆ Jake │
└─────┴──────┘
To select all non-numeric type columns:
>>> import polars as pl
>>> import polars.selectors as cs
>>>
>>> df = pl.DataFrame(
... {
... "id": [1, 2, 3],
... "name": ["John", "Jane", "Jake"],
... "else": [10.0, 20.0, 30.0],
... }
... )
>>>
>>> print(df.select(~cs.by_dtype(pl.NUMERIC_DTYPES)))
>>> # OR print(df.select(~cs.numeric()))
shape: (3, 1)
┌──────┐
│ name │
│ --- │
│ str │
╞══════╡
│ John │
│ Jane │
│ Jake │
└──────┘
2 Comments
Björn
Would it not be enough now-a-days (
0.19.12) to do just df.select(pl.col(pl.NUMERIC_DTYPES))Samuel Allain
as of today (1.9.0),
pl.NUMERIC_DTYPES is deprecated. An alternative is import polars.selectors as cs ; df.select(cs.numeric())When working with groups of datatypes, like numeric dtypes, you can use polars.selectors datatype groups directly.
Groups include: categorical, date, datetime, float, integer, numeric, string, temporal and time.
# strip whitespace from all string columns
df = df.with_columns(cs.string().str.strip_chars())
# convert all numeric types to float32
df = df.with_columns(cs.numeric().cast(pl.Float32))