Selectors#
Selectors allow for more intuitive selection of columns from DataFrame
or LazyFrame objects based on their name, dtype or other properties.
They unify and build on the related functionality that is available through
the col() expression and can also broadcast expressions over the selected
columns.
Importing#
Selectors are available as functions imported from
polars.selectorsTypical/recommended usage is to import the module as
csand employ selectors from there.import polars.selectors as cs import polars as pl df = pl.DataFrame( { "w": ["xx", "yy", "xx", "yy", "xx"], "x": [1, 2, 1, 4, -2], "y": [3.0, 4.5, 1.0, 2.5, -2.0], "z": ["a", "b", "a", "b", "b"], }, ) df.group_by(cs.string()).agg(cs.numeric().sum())
Set operations#
Selectors support the following set operations:
Operation |
Expression |
|---|---|
|
|
|
|
|
|
|
|
|
|
Note that both individual selector results and selector set operations will always return matching columns in the same order as the underlying frame schema.
Examples#
import polars.selectors as cs
import polars as pl
# set up an empty dataframe with plenty of columns of various dtypes
df = pl.DataFrame(
schema={
"abc": pl.UInt16,
"bbb": pl.UInt32,
"cde": pl.Float64,
"def": pl.Float32,
"eee": pl.Boolean,
"fgg": pl.Boolean,
"ghi": pl.Time,
"JJK": pl.Date,
"Lmn": pl.Duration,
"opp": pl.Datetime("ms"),
"qqR": pl.String,
},
)
# Select the UNION of temporal, strings and columns that start with "e"
assert df.select(cs.temporal() | cs.string() | cs.starts_with("e")).schema == {
"eee": pl.Boolean,
"ghi": pl.Time,
"JJK": pl.Date,
"Lmn": pl.Duration,
"opp": pl.Datetime("ms"),
"qqR": pl.String,
}
# Select the INTERSECTION of temporal and column names that match "opp" OR "JJK"
assert df.select(cs.temporal() & cs.matches("opp|JJK")).schema == {
"JJK": pl.Date,
"opp": pl.Datetime("ms"),
}
# Select the DIFFERENCE of temporal columns and columns that contain the name "opp" OR "JJK"
assert df.select(cs.temporal() - cs.matches("opp|JJK")).schema == {
"ghi": pl.Time,
"Lmn": pl.Duration,
}
# Select the SYMMETRIC DIFFERENCE of numeric columns and columns that contain an "e"
assert df.select(cs.contains("e") ^ cs.numeric()).schema == {
"abc": UInt16,
"bbb": UInt32,
"eee": Boolean,
}
# Select the COMPLEMENT of all columns of dtypes Duration and Time
assert df.select(~cs.by_dtype([pl.Duration, pl.Time])).schema == {
"abc": pl.UInt16,
"bbb": pl.UInt32,
"cde": pl.Float64,
"def": pl.Float32,
"eee": pl.Boolean,
"fgg": pl.Boolean,
"JJK": pl.Date,
"opp": pl.Datetime("ms"),
"qqR": pl.String,
}
Note
If you donβt want to use the set operations on the selectors, you can materialize them as expressions
by calling as_expr. This ensures the operations OR, AND, etc are dispatched to the underlying
expressions instead.
Functions#
Available selector functions:
|
Base column selector expression/proxy. |
|
Select all columns. |
|
Select all columns with alphabetic names (eg: only letters). |
|
Select all columns with alphanumeric names (eg: only letters and the digits 0-9). |
|
Select all array columns. |
|
Select all binary columns. |
|
Select all boolean columns. |
|
Select all columns matching the given dtypes. |
|
Select all columns matching the given indices (or range objects). |
|
Select all columns matching the given names. |
Select all categorical columns. |
|
|
Select columns whose names contain the given literal substring(s). |
|
Select all date columns. |
|
Select all datetime columns, optionally filtering by time unit/zone. |
|
Select all decimal columns. |
|
Select all columns having names consisting only of digits. |
|
Select all duration columns, optionally filtering by time unit. |
|
Select columns that end with the given substring(s). |
|
Select all enum columns. |
|
Select all columns except those matching the given columns, datatypes, or selectors. |
|
Expand selector to column names, with respect to a specific frame or target schema. |
|
Select the first column in the current scope. |
|
Select all float columns. |
|
Select all integer columns. |
|
Indicate whether the given object/expression is a selector. |
|
Select the last column in the current scope. |
|
Select all list columns. |
|
Select all columns that match the given regex pattern. |
|
Select all nested columns. |
|
Select all numeric columns. |
Select all signed integer columns. |
|
|
Select columns that start with the given substring(s). |
|
Select all String (and, optionally, Categorical) string columns. |
|
Select all struct columns. |
|
Select all temporal columns. |
|
Select all time columns. |
Select all unsigned integer columns. |
- class polars.selectors.Selector[source]#
Base column selector expression/proxy.
as_expr()Materialize the
selectoras a normal expression.exclude(columns, *more_columns)Exclude columns from a multi-column expression.
- as_expr() Expr[source]#
Materialize the
selectoras a normal expression.This ensures that the operators
|,&,~and-are applied on the data and not on the selector sets.Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "colx": ["aa", "bb", "cc"], ... "coly": [True, False, True], ... "colz": [1, 2, 3], ... } ... )
Inverting the boolean selector will choose the non-boolean columns:
>>> df.select(~cs.boolean()) shape: (3, 2) ββββββββ¬βββββββ β colx β colz β β --- β --- β β str β i64 β ββββββββͺβββββββ‘ β aa β 1 β β bb β 2 β β cc β 3 β ββββββββ΄βββββββ
To invert the values in the selected boolean columns, we need to materialize the selector as a standard expression instead:
>>> df.select(~cs.boolean().as_expr()) shape: (3, 1) βββββββββ β coly β β --- β β bool β βββββββββ‘ β false β β true β β false β βββββββββ
- exclude(
- columns: str | PolarsDataType | Collection[str] | Collection[PolarsDataType],
- *more_columns: str | PolarsDataType,
Exclude columns from a multi-column expression.
Only works after a wildcard or regex column selection, and you cannot provide both string column names and dtypes (you may prefer to use selectors instead).
- Parameters:
- columns
The name or datatype of the column(s) to exclude. Accepts regular expression input. Regular expressions should start with
^and end with$.- *more_columns
Additional names or datatypes of columns to exclude, specified as positional arguments.
- polars.selectors.all() Selector[source]#
Select all columns.
See also
Examples
>>> from datetime import date >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(1999, 12, 31), date(2024, 1, 1)], ... "value": [1_234_500, 5_000_555], ... }, ... schema_overrides={"value": pl.Int32}, ... )
Select all columns, casting them to string:
>>> df.select(cs.all().cast(pl.String)) shape: (2, 2) ββββββββββββββ¬ββββββββββ β dt β value β β --- β --- β β str β str β ββββββββββββββͺββββββββββ‘ β 1999-12-31 β 1234500 β β 2024-01-01 β 5000555 β ββββββββββββββ΄ββββββββββ
Select all columns except for those matching the given dtypes:
>>> df.select(cs.all() - cs.numeric()) shape: (2, 1) ββββββββββββββ β dt β β --- β β date β ββββββββββββββ‘ β 1999-12-31 β β 2024-01-01 β ββββββββββββββ
- polars.selectors.alpha( ) Selector[source]#
Select all columns with alphabetic names (eg: only letters).
- Parameters:
- ascii_only
Indicate whether to consider only ASCII alphabetic characters, or the full Unicode range of valid letters (accented, idiographic, etc).
- ignore_spaces
Indicate whether to ignore the presence of spaces in column names; if so, only the other (non-space) characters are considered.
Notes
Matching column names cannot contain any non-alphabetic characters. Note that the definition of βalphabeticβ consists of all valid Unicode alphabetic characters (
p{Alphabetic}) by default; this can be changed by settingascii_only=True.Examples
>>> import polars as pl >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "no1": [100, 200, 300], ... "cafΓ©": ["espresso", "latte", "mocha"], ... "t or f": [True, False, None], ... "hmm": ["aaa", "bbb", "ccc"], ... "ι½εΈ": ["ζ±δΊ¬", "ε€§ιͺ", "δΊ¬ι½"], ... } ... )
Select columns with alphabetic names; note that accented characters and kanji are recognised as alphabetic here:
>>> df.select(cs.alpha()) shape: (3, 3) ββββββββββββ¬ββββββ¬βββββββ β cafΓ© β hmm β ι½εΈ β β --- β --- β --- β β str β str β str β ββββββββββββͺββββββͺβββββββ‘ β espresso β aaa β ζ±δΊ¬ β β latte β bbb β ε€§ιͺ β β mocha β ccc β δΊ¬ι½ β ββββββββββββ΄ββββββ΄βββββββ
Constrain the definition of βalphabeticβ to ASCII characters only:
>>> df.select(cs.alpha(ascii_only=True)) shape: (3, 1) βββββββ β hmm β β --- β β str β βββββββ‘ β aaa β β bbb β β ccc β βββββββ
>>> df.select(cs.alpha(ascii_only=True, ignore_spaces=True)) shape: (3, 2) ββββββββββ¬ββββββ β t or f β hmm β β --- β --- β β bool β str β ββββββββββͺββββββ‘ β true β aaa β β false β bbb β β null β ccc β ββββββββββ΄ββββββ
Select all columns except for those with alphabetic names:
>>> df.select(~cs.alpha()) shape: (3, 2) βββββββ¬βββββββββ β no1 β t or f β β --- β --- β β i64 β bool β βββββββͺβββββββββ‘ β 100 β true β β 200 β false β β 300 β null β βββββββ΄βββββββββ
>>> df.select(~cs.alpha(ignore_spaces=True)) shape: (3, 1) βββββββ β no1 β β --- β β i64 β βββββββ‘ β 100 β β 200 β β 300 β βββββββ
- polars.selectors.alphanumeric( ) Selector[source]#
Select all columns with alphanumeric names (eg: only letters and the digits 0-9).
- Parameters:
- ascii_only
Indicate whether to consider only ASCII alphabetic characters, or the full Unicode range of valid letters (accented, idiographic, etc).
- ignore_spaces
Indicate whether to ignore the presence of spaces in column names; if so, only the other (non-space) characters are considered.
Notes
Matching column names cannot contain any non-alphabetic or integer characters. Note that the definition of βalphabeticβ consists of all valid Unicode alphabetic characters (
p{Alphabetic}) and digit characters (d) by default; this can be changed by settingascii_only=True.Examples
>>> import polars as pl >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "1st_col": [100, 200, 300], ... "flagged": [True, False, True], ... "00prefix": ["01:aa", "02:bb", "03:cc"], ... "last col": ["x", "y", "z"], ... } ... )
Select columns with alphanumeric names:
>>> df.select(cs.alphanumeric()) shape: (3, 2) βββββββββββ¬βββββββββββ β flagged β 00prefix β β --- β --- β β bool β str β βββββββββββͺβββββββββββ‘ β true β 01:aa β β false β 02:bb β β true β 03:cc β βββββββββββ΄βββββββββββ
>>> df.select(cs.alphanumeric(ignore_spaces=True)) shape: (3, 3) βββββββββββ¬βββββββββββ¬βββββββββββ β flagged β 00prefix β last col β β --- β --- β --- β β bool β str β str β βββββββββββͺβββββββββββͺβββββββββββ‘ β true β 01:aa β x β β false β 02:bb β y β β true β 03:cc β z β βββββββββββ΄βββββββββββ΄βββββββββββ
Select all columns except for those with alphanumeric names:
>>> df.select(~cs.alphanumeric()) shape: (3, 2) βββββββββββ¬βββββββββββ β 1st_col β last col β β --- β --- β β i64 β str β βββββββββββͺβββββββββββ‘ β 100 β x β β 200 β y β β 300 β z β βββββββββββ΄βββββββββββ
>>> df.select(~cs.alphanumeric(ignore_spaces=True)) shape: (3, 1) βββββββββββ β 1st_col β β --- β β i64 β βββββββββββ‘ β 100 β β 200 β β 300 β βββββββββββ
- polars.selectors.array( ) Selector[source]#
Select all array columns.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
See also
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [["xx", "yy"], ["x", "y"]], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... }, ... schema_overrides={"foo": pl.Array(pl.String, 2)}, ... )
Select all array columns:
>>> df.select(cs.array()) shape: (2, 1) βββββββββββββββββ β foo β β --- β β array[str, 2] β βββββββββββββββββ‘ β ["xx", "yy"] β β ["x", "y"] β βββββββββββββββββ
Select all columns except for those that are array:
>>> df.select(~cs.array()) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
Select all array columns with a certain matching inner type:
>>> df.select(cs.array(cs.string())) shape: (2, 1) βββββββββββββββββ β foo β β --- β β array[str, 2] β βββββββββββββββββ‘ β ["xx", "yy"] β β ["x", "y"] β βββββββββββββββββ >>> df.select(cs.array(cs.integer())) shape: (0, 0) ββ ββ‘ ββ >>> df.select(cs.array(width=2)) shape: (2, 1) βββββββββββββββββ β foo β β --- β β array[str, 2] β βββββββββββββββββ‘ β ["xx", "yy"] β β ["x", "y"] β βββββββββββββββββ >>> df.select(cs.array(width=3)) shape: (0, 0) ββ ββ‘ ββ
- polars.selectors.binary() Selector[source]#
Select all binary columns.
See also
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame({"a": [b"hello"], "b": ["world"], "c": [b"!"], "d": [":)"]}) >>> df shape: (1, 4) ββββββββββββ¬ββββββββ¬βββββββββ¬ββββββ β a β b β c β d β β --- β --- β --- β --- β β binary β str β binary β str β ββββββββββββͺββββββββͺβββββββββͺββββββ‘ β b"hello" β world β b"!" β :) β ββββββββββββ΄ββββββββ΄βββββββββ΄ββββββ
Select binary columns and export as a dict:
>>> df.select(cs.binary()).to_dict(as_series=False) {'a': [b'hello'], 'c': [b'!']}
Select all columns except for those that are binary:
>>> df.select(~cs.binary()).to_dict(as_series=False) {'b': ['world'], 'd': [':)']}
- polars.selectors.boolean() Selector[source]#
Select all boolean columns.
See also
by_dtypeSelect all columns matching the given dtype(s).
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame({"n": range(1, 5)}).with_columns(n_even=pl.col("n") % 2 == 0) >>> df shape: (4, 2) βββββββ¬βββββββββ β n β n_even β β --- β --- β β i64 β bool β βββββββͺβββββββββ‘ β 1 β false β β 2 β true β β 3 β false β β 4 β true β βββββββ΄βββββββββ
Select and invert boolean columns:
>>> df.with_columns(is_odd=cs.boolean().not_()) shape: (4, 3) βββββββ¬βββββββββ¬βββββββββ β n β n_even β is_odd β β --- β --- β --- β β i64 β bool β bool β βββββββͺβββββββββͺβββββββββ‘ β 1 β false β true β β 2 β true β false β β 3 β false β true β β 4 β true β false β βββββββ΄βββββββββ΄βββββββββ
Select all columns except for those that are boolean:
>>> df.select(~cs.boolean()) shape: (4, 1) βββββββ β n β β --- β β i64 β βββββββ‘ β 1 β β 2 β β 3 β β 4 β βββββββ
- polars.selectors.by_dtype(
- *dtypes: PolarsDataType | PythonDataType | Iterable[PolarsDataType] | Iterable[PythonDataType],
Select all columns matching the given dtypes.
See also
Examples
>>> from datetime import date >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(1999, 12, 31), date(2024, 1, 1), date(2010, 7, 5)], ... "value": [1_234_500, 5_000_555, -4_500_000], ... "other": ["foo", "bar", "foo"], ... } ... )
Select all columns with date or string dtypes:
>>> df.select(cs.by_dtype(pl.Date, pl.String)) shape: (3, 2) ββββββββββββββ¬ββββββββ β dt β other β β --- β --- β β date β str β ββββββββββββββͺββββββββ‘ β 1999-12-31 β foo β β 2024-01-01 β bar β β 2010-07-05 β foo β ββββββββββββββ΄ββββββββ
Select all columns that are not of date or string dtype:
>>> df.select(~cs.by_dtype(pl.Date, pl.String)) shape: (3, 1) ββββββββββββ β value β β --- β β i64 β ββββββββββββ‘ β 1234500 β β 5000555 β β -4500000 β ββββββββββββ
Group by string columns and sum the numeric columns:
>>> df.group_by(cs.string()).agg(cs.numeric().sum()).sort(by="other") shape: (2, 2) βββββββββ¬βββββββββββ β other β value β β --- β --- β β str β i64 β βββββββββͺβββββββββββ‘ β bar β 5000555 β β foo β -3265500 β βββββββββ΄βββββββββββ
- polars.selectors.by_index( ) Selector[source]#
Select all columns matching the given indices (or range objects).
- Parameters:
- *indices
One or more column indices (or range objects). Negative indexing is supported.
- require_all
By default, all specified indices must be valid; if any index is out of bounds, an error is raised. If set to
False, out-of-bounds indices are ignored
See also
Notes
Matching columns are returned in the order in which their indexes appear in the selector, not the underlying schema order.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "key": ["abc"], ... **{f"c{i:02}": [0.5 * i] for i in range(100)}, ... }, ... ) >>> print(df) shape: (1, 101) βββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ β key β c00 β c01 β c02 β β¦ β c96 β c97 β c98 β c99 β β --- β --- β --- β --- β β --- β --- β --- β --- β β str β f64 β f64 β f64 β β f64 β f64 β f64 β f64 β βββββββͺββββββͺββββββͺββββββͺββββͺβββββββͺβββββββͺβββββββͺβββββββ‘ β abc β 0.0 β 0.5 β 1.0 β β¦ β 48.0 β 48.5 β 49.0 β 49.5 β βββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ
Select columns by index (βkeyβ column and the two first/last columns):
>>> df.select(cs.by_index(0, 1, 2, -2, -1)) shape: (1, 5) βββββββ¬ββββββ¬ββββββ¬βββββββ¬βββββββ β key β c00 β c01 β c98 β c99 β β --- β --- β --- β --- β --- β β str β f64 β f64 β f64 β f64 β βββββββͺββββββͺββββββͺβββββββͺβββββββ‘ β abc β 0.0 β 0.5 β 49.0 β 49.5 β βββββββ΄ββββββ΄ββββββ΄βββββββ΄βββββββ
Select the βkeyβ column and use a
rangeobject to select various columns. Note that you can freely mix and match integer indices andrangeobjects:>>> df.select(cs.by_index(0, range(1, 101, 20))) shape: (1, 6) βββββββ¬ββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ β key β c00 β c20 β c40 β c60 β c80 β β --- β --- β --- β --- β --- β --- β β str β f64 β f64 β f64 β f64 β f64 β βββββββͺββββββͺβββββββͺβββββββͺβββββββͺβββββββ‘ β abc β 0.0 β 10.0 β 20.0 β 30.0 β 40.0 β βββββββ΄ββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ
>>> df.select(cs.by_index(0, range(101, 0, -25), require_all=False)) shape: (1, 5) βββββββ¬βββββββ¬βββββββ¬βββββββ¬ββββββ β key β c75 β c50 β c25 β c00 β β --- β --- β --- β --- β --- β β str β f64 β f64 β f64 β f64 β βββββββͺβββββββͺβββββββͺβββββββͺββββββ‘ β abc β 37.5 β 25.0 β 12.5 β 0.0 β βββββββ΄βββββββ΄βββββββ΄βββββββ΄ββββββ
Select all columns except for the even-indexed ones:
>>> df.select(~cs.by_index(range(1, 100, 2))) shape: (1, 51) βββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ β key β c01 β c03 β c05 β β¦ β c93 β c95 β c97 β c99 β β --- β --- β --- β --- β β --- β --- β --- β --- β β str β f64 β f64 β f64 β β f64 β f64 β f64 β f64 β βββββββͺββββββͺββββββͺββββββͺββββͺβββββββͺβββββββͺβββββββͺβββββββ‘ β abc β 0.5 β 1.5 β 2.5 β β¦ β 46.5 β 47.5 β 48.5 β 49.5 β βββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ
- polars.selectors.by_name(
- *names: str | Collection[str],
- require_all: bool = True,
Select all columns matching the given names.
Added in version 0.20.27: The
require_allparameter was added.- Parameters:
- *names
One or more names of columns to select.
- require_all
Whether to match all names (the default) or any of the names.
See also
Notes
Matching columns are returned in the order in which they are declared in the selector, not the underlying schema order.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [False, True], ... } ... )
Select columns by name:
>>> df.select(cs.by_name("foo", "bar")) shape: (2, 2) βββββββ¬ββββββ β foo β bar β β --- β --- β β str β i64 β βββββββͺββββββ‘ β x β 123 β β y β 456 β βββββββ΄ββββββ
Match any of the given columns by name:
>>> df.select(cs.by_name("baz", "moose", "foo", "bear", require_all=False)) shape: (2, 2) βββββββ¬ββββββ β baz β foo β β --- β --- β β f64 β str β βββββββͺββββββ‘ β 2.0 β x β β 5.5 β y β βββββββ΄ββββββ
Match all columns except for those given:
>>> df.select(~cs.by_name("foo", "bar")) shape: (2, 2) βββββββ¬ββββββββ β baz β zap β β --- β --- β β f64 β bool β βββββββͺββββββββ‘ β 2.0 β false β β 5.5 β true β βββββββ΄ββββββββ
- polars.selectors.categorical() Selector[source]#
Select all categorical columns.
See also
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["xx", "yy"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... }, ... schema_overrides={"foo": pl.Categorical}, ... )
Select all categorical columns:
>>> df.select(cs.categorical()) shape: (2, 1) βββββββ β foo β β --- β β cat β βββββββ‘ β xx β β yy β βββββββ
Select all columns except for those that are categorical:
>>> df.select(~cs.categorical()) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
- polars.selectors.contains(*substring: str) Selector[source]#
Select columns whose names contain the given literal substring(s).
- Parameters:
- substring
Substring(s) that matching column names should contain.
See also
matchesSelect all columns that match the given regex pattern.
ends_withSelect columns that end with the given substring(s).
starts_withSelect columns that start with the given substring(s).
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [False, True], ... } ... )
Select columns that contain the substring βbaβ:
>>> df.select(cs.contains("ba")) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
Select columns that contain the substring βbaβ or the letter βzβ:
>>> df.select(cs.contains("ba", "z")) shape: (2, 3) βββββββ¬ββββββ¬ββββββββ β bar β baz β zap β β --- β --- β --- β β i64 β f64 β bool β βββββββͺββββββͺββββββββ‘ β 123 β 2.0 β false β β 456 β 5.5 β true β βββββββ΄ββββββ΄ββββββββ
Select all columns except for those that contain the substring βbaβ:
>>> df.select(~cs.contains("ba")) shape: (2, 2) βββββββ¬ββββββββ β foo β zap β β --- β --- β β str β bool β βββββββͺββββββββ‘ β x β false β β y β true β βββββββ΄ββββββββ
- polars.selectors.date() Selector[source]#
Select all date columns.
See also
Examples
>>> from datetime import date, datetime, time >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dtm": [datetime(2001, 5, 7, 10, 25), datetime(2031, 12, 31, 0, 30)], ... "dt": [date(1999, 12, 31), date(2024, 8, 9)], ... "tm": [time(0, 0, 0), time(23, 59, 59)], ... }, ... )
Select all date columns:
>>> df.select(cs.date()) shape: (2, 1) ββββββββββββββ β dt β β --- β β date β ββββββββββββββ‘ β 1999-12-31 β β 2024-08-09 β ββββββββββββββ
Select all columns except for those that are dates:
>>> df.select(~cs.date()) shape: (2, 2) βββββββββββββββββββββββ¬βββββββββββ β dtm β tm β β --- β --- β β datetime[ΞΌs] β time β βββββββββββββββββββββββͺβββββββββββ‘ β 2001-05-07 10:25:00 β 00:00:00 β β 2031-12-31 00:30:00 β 23:59:59 β βββββββββββββββββββββββ΄βββββββββββ
- polars.selectors.datetime(
- time_unit: TimeUnit | Collection[TimeUnit] | None = None,
- time_zone: str | pytimezone | Collection[str | pytimezone | None] | None = ('*', None),
Select all datetime columns, optionally filtering by time unit/zone.
- Parameters:
- time_unit
One (or more) of the allowed timeunit precision strings, βmsβ, βusβ, and βnsβ. Omit to select columns with any valid timeunit.
- time_zone
One or more timezone strings, as defined in zoneinfo (to see valid options run
import zoneinfo; zoneinfo.available_timezones()for a full list).Set
Noneto select Datetime columns that do not have a timezone.Set β*β to select Datetime columns that have any timezone.
See also
Examples
>>> from datetime import datetime, date, timezone >>> import polars.selectors as cs >>> from zoneinfo import ZoneInfo >>> tokyo_tz = ZoneInfo("Asia/Tokyo") >>> utc_tz = timezone.utc >>> df = pl.DataFrame( ... { ... "tstamp_tokyo": [ ... datetime(1999, 7, 21, 5, 20, 16, 987654, tzinfo=tokyo_tz), ... datetime(2000, 5, 16, 6, 21, 21, 123465, tzinfo=tokyo_tz), ... ], ... "tstamp_utc": [ ... datetime(2023, 4, 10, 12, 14, 16, 999000, tzinfo=utc_tz), ... datetime(2025, 8, 25, 14, 18, 22, 666000, tzinfo=utc_tz), ... ], ... "tstamp": [ ... datetime(2000, 11, 20, 18, 12, 16, 600000), ... datetime(2020, 10, 30, 10, 20, 25, 123000), ... ], ... "dt": [date(1999, 12, 31), date(2010, 7, 5)], ... }, ... schema_overrides={ ... "tstamp_tokyo": pl.Datetime("ns", "Asia/Tokyo"), ... "tstamp_utc": pl.Datetime("us", "UTC"), ... }, ... )
Select all datetime columns:
>>> df.select(cs.datetime()) shape: (2, 3) ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ β tstamp_tokyo β tstamp_utc β tstamp β β --- β --- β --- β β datetime[ns, Asia/Tokyo] β datetime[ΞΌs, UTC] β datetime[ΞΌs] β ββββββββββββββββββββββββββββββββββͺββββββββββββββββββββββββββββββͺββββββββββββββββββββββββββ‘ β 1999-07-21 05:20:16.987654 JST β 2023-04-10 12:14:16.999 UTC β 2000-11-20 18:12:16.600 β β 2000-05-16 06:21:21.123465 JST β 2025-08-25 14:18:22.666 UTC β 2020-10-30 10:20:25.123 β ββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
Select all datetime columns that have βusβ precision:
>>> df.select(cs.datetime("us")) shape: (2, 2) βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββ β tstamp_utc β tstamp β β --- β --- β β datetime[ΞΌs, UTC] β datetime[ΞΌs] β βββββββββββββββββββββββββββββββͺββββββββββββββββββββββββββ‘ β 2023-04-10 12:14:16.999 UTC β 2000-11-20 18:12:16.600 β β 2025-08-25 14:18:22.666 UTC β 2020-10-30 10:20:25.123 β βββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
Select all datetime columns that have any timezone:
>>> df.select(cs.datetime(time_zone="*")) shape: (2, 2) ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ β tstamp_tokyo β tstamp_utc β β --- β --- β β datetime[ns, Asia/Tokyo] β datetime[ΞΌs, UTC] β ββββββββββββββββββββββββββββββββββͺββββββββββββββββββββββββββββββ‘ β 1999-07-21 05:20:16.987654 JST β 2023-04-10 12:14:16.999 UTC β β 2000-05-16 06:21:21.123465 JST β 2025-08-25 14:18:22.666 UTC β ββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ
Select all datetime columns that have a specific timezone:
>>> df.select(cs.datetime(time_zone="UTC")) shape: (2, 1) βββββββββββββββββββββββββββββββ β tstamp_utc β β --- β β datetime[ΞΌs, UTC] β βββββββββββββββββββββββββββββββ‘ β 2023-04-10 12:14:16.999 UTC β β 2025-08-25 14:18:22.666 UTC β βββββββββββββββββββββββββββββββ
Select all datetime columns that have NO timezone:
>>> df.select(cs.datetime(time_zone=None)) shape: (2, 1) βββββββββββββββββββββββββββ β tstamp β β --- β β datetime[ΞΌs] β βββββββββββββββββββββββββββ‘ β 2000-11-20 18:12:16.600 β β 2020-10-30 10:20:25.123 β βββββββββββββββββββββββββββ
Select all columns except for datetime columns:
>>> df.select(~cs.datetime()) shape: (2, 1) ββββββββββββββ β dt β β --- β β date β ββββββββββββββ‘ β 1999-12-31 β β 2010-07-05 β ββββββββββββββ
- polars.selectors.decimal() Selector[source]#
Select all decimal columns.
See also
Examples
>>> from decimal import Decimal as D >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [D(123), D(456)], ... "baz": [D("2.0005"), D("-50.5555")], ... }, ... schema_overrides={"baz": pl.Decimal(scale=5, precision=10)}, ... )
Select all decimal columns:
>>> df.select(cs.decimal()) shape: (2, 2) βββββββββββββββββ¬ββββββββββββββββ β bar β baz β β --- β --- β β decimal[38,0] β decimal[10,5] β βββββββββββββββββͺββββββββββββββββ‘ β 123 β 2.00050 β β 456 β -50.55550 β βββββββββββββββββ΄ββββββββββββββββ
Select all columns except the decimal ones:
>>> df.select(~cs.decimal()) shape: (2, 1) βββββββ β foo β β --- β β str β βββββββ‘ β x β β y β βββββββ
- polars.selectors.digit(ascii_only: bool = False) Selector[source]#
Select all columns having names consisting only of digits.
Notes
Matching column names cannot contain any non-digit characters. Note that the definition of βdigitβ consists of all valid Unicode digit characters (
d) by default; this can be changed by settingascii_only=True.Examples
>>> import polars as pl >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "key": ["aaa", "bbb", "aaa", "bbb", "bbb"], ... "year": [2001, 2001, 2025, 2025, 2001], ... "value": [-25, 100, 75, -15, -5], ... } ... ).pivot( ... values="value", ... index="key", ... on="year", ... aggregate_function="sum", ... ) >>> print(df) shape: (2, 3) βββββββ¬βββββββ¬βββββββ β key β 2001 β 2025 β β --- β --- β --- β β str β i64 β i64 β βββββββͺβββββββͺβββββββ‘ β aaa β -25 β 75 β β bbb β 95 β -15 β βββββββ΄βββββββ΄βββββββ
Select columns with digit names:
>>> df.select(cs.digit()) shape: (2, 2) ββββββββ¬βββββββ β 2001 β 2025 β β --- β --- β β i64 β i64 β ββββββββͺβββββββ‘ β -25 β 75 β β 95 β -15 β ββββββββ΄βββββββ
Select all columns except for those with digit names:
>>> df.select(~cs.digit()) shape: (2, 1) βββββββ β key β β --- β β str β βββββββ‘ β aaa β β bbb β βββββββ
Demonstrate use of
ascii_onlyflag (by default all valid unicode digits are considered, but this can be constrained to ascii 0-9):>>> df = pl.DataFrame({"ΰ₯§ΰ₯―ΰ₯―ΰ₯―": [1999], "ΰ₯¨ΰ₯¦ΰ₯ΰ₯": [2077], "3000": [3000]}) >>> df.select(cs.digit()) shape: (1, 3) ββββββββ¬βββββββ¬βββββββ β ΰ₯§ΰ₯―ΰ₯―ΰ₯― β ΰ₯¨ΰ₯¦ΰ₯ΰ₯ β 3000 β β --- β --- β --- β β i64 β i64 β i64 β ββββββββͺβββββββͺβββββββ‘ β 1999 β 2077 β 3000 β ββββββββ΄βββββββ΄βββββββ
>>> df.select(cs.digit(ascii_only=True)) shape: (1, 1) ββββββββ β 3000 β β --- β β i64 β ββββββββ‘ β 3000 β ββββββββ
- polars.selectors.duration(time_unit: TimeUnit | Collection[TimeUnit] | None = None) Selector[source]#
Select all duration columns, optionally filtering by time unit.
- Parameters:
- time_unit
One (or more) of the allowed timeunit precision strings, βmsβ, βusβ, and βnsβ. Omit to select columns with any valid timeunit.
See also
Examples
>>> from datetime import date, timedelta >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(2022, 1, 31), date(2025, 7, 5)], ... "td1": [ ... timedelta(days=1, milliseconds=123456), ... timedelta(days=1, hours=23, microseconds=987000), ... ], ... "td2": [ ... timedelta(days=7, microseconds=456789), ... timedelta(days=14, minutes=999, seconds=59), ... ], ... "td3": [ ... timedelta(weeks=4, days=-10, microseconds=999999), ... timedelta(weeks=3, milliseconds=123456, microseconds=1), ... ], ... }, ... schema_overrides={ ... "td1": pl.Duration("ms"), ... "td2": pl.Duration("us"), ... "td3": pl.Duration("ns"), ... }, ... )
Select all duration columns:
>>> df.select(cs.duration()) shape: (2, 3) ββββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββββββ β td1 β td2 β td3 β β --- β --- β --- β β duration[ms] β duration[ΞΌs] β duration[ns] β ββββββββββββββββββͺββββββββββββββββββͺβββββββββββββββββββββ‘ β 1d 2m 3s 456ms β 7d 456789Β΅s β 18d 999999Β΅s β β 1d 23h 987ms β 14d 16h 39m 59s β 21d 2m 3s 456001Β΅s β ββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββ
Select all duration columns that have βmsβ precision:
>>> df.select(cs.duration("ms")) shape: (2, 1) ββββββββββββββββββ β td1 β β --- β β duration[ms] β ββββββββββββββββββ‘ β 1d 2m 3s 456ms β β 1d 23h 987ms β ββββββββββββββββββ
Select all duration columns that have βmsβ OR βnsβ precision:
>>> df.select(cs.duration(["ms", "ns"])) shape: (2, 2) ββββββββββββββββββ¬βββββββββββββββββββββ β td1 β td3 β β --- β --- β β duration[ms] β duration[ns] β ββββββββββββββββββͺβββββββββββββββββββββ‘ β 1d 2m 3s 456ms β 18d 999999Β΅s β β 1d 23h 987ms β 21d 2m 3s 456001Β΅s β ββββββββββββββββββ΄βββββββββββββββββββββ
Select all columns except for duration columns:
>>> df.select(~cs.duration()) shape: (2, 1) ββββββββββββββ β dt β β --- β β date β ββββββββββββββ‘ β 2022-01-31 β β 2025-07-05 β ββββββββββββββ
- polars.selectors.ends_with(*suffix: str) Selector[source]#
Select columns that end with the given substring(s).
- Parameters:
- suffix
Substring(s) that matching column names should end with.
See also
containsSelect columns that contain the given literal substring(s).
matchesSelect all columns that match the given regex pattern.
starts_withSelect columns that start with the given substring(s).
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [False, True], ... } ... )
Select columns that end with the substring βzβ:
>>> df.select(cs.ends_with("z")) shape: (2, 1) βββββββ β baz β β --- β β f64 β βββββββ‘ β 2.0 β β 5.5 β βββββββ
Select columns that end with either the letter βzβ or βrβ:
>>> df.select(cs.ends_with("z", "r")) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
Select all columns except for those that end with the substring βzβ:
>>> df.select(~cs.ends_with("z")) shape: (2, 3) βββββββ¬ββββββ¬ββββββββ β foo β bar β zap β β --- β --- β --- β β str β i64 β bool β βββββββͺββββββͺββββββββ‘ β x β 123 β false β β y β 456 β true β βββββββ΄ββββββ΄ββββββββ
- polars.selectors.enum() Selector[source]#
Select all enum columns.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
See also
by_dtypeSelect all columns matching the given dtype(s).
categoricalSelect all categorical columns.
stringSelect all string columns (optionally including categoricals).
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["xx", "yy"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... }, ... schema_overrides={"foo": pl.Enum(["xx", "yy"])}, ... )
Select all enum columns:
>>> df.select(cs.enum()) shape: (2, 1) ββββββββ β foo β β --- β β enum β ββββββββ‘ β xx β β yy β ββββββββ
Select all columns except for those that are enum:
>>> df.select(~cs.enum()) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
- polars.selectors.exclude(
- columns: str | PolarsDataType | Selector | Expr | Collection[str | PolarsDataType | Selector | Expr],
- *more_columns: str | PolarsDataType | Selector | Expr,
Select all columns except those matching the given columns, datatypes, or selectors.
- Parameters:
- columns
One or more columns (col or name), datatypes, columns, or selectors representing the columns to exclude.
- *more_columns
Additional columns, datatypes, or selectors to exclude, specified as positional arguments.
Notes
If excluding a single selector it is simpler to write as
~selectorinstead.Examples
Exclude by column name(s):
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "aa": [1, 2, 3], ... "ba": ["a", "b", None], ... "cc": [None, 2.5, 1.5], ... } ... ) >>> df.select(cs.exclude("ba", "xx")) shape: (3, 2) βββββββ¬βββββββ β aa β cc β β --- β --- β β i64 β f64 β βββββββͺβββββββ‘ β 1 β null β β 2 β 2.5 β β 3 β 1.5 β βββββββ΄βββββββ
Exclude using a column name, a selector, and a dtype:
>>> df.select(cs.exclude("aa", cs.string(), pl.UInt32)) shape: (3, 1) ββββββββ β cc β β --- β β f64 β ββββββββ‘ β null β β 2.5 β β 1.5 β ββββββββ
- polars.selectors.expand_selector(
- target: DataFrame | LazyFrame | Mapping[str, PolarsDataType],
- selector: Selector | Expr,
- *,
- strict: bool = True,
Expand selector to column names, with respect to a specific frame or target schema.
Added in version 0.20.30: The
strictparameter was added.- Parameters:
- target
A Polars DataFrame, LazyFrame or Schema.
- selector
An arbitrary polars selector (or compound selector).
- strict
Setting False additionally allows for a broader range of column selection expressions (such as bare columns or use of
.exclude()) to be expanded, not just the dedicated selectors.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "colx": ["a", "b", "c"], ... "coly": [123, 456, 789], ... "colz": [2.0, 5.5, 8.0], ... } ... )
Expand selector with respect to an existing
DataFrame:>>> cs.expand_selector(df, cs.numeric()) ('coly', 'colz') >>> cs.expand_selector(df, cs.first() | cs.last()) ('colx', 'colz')
This also works with
LazyFrame:>>> cs.expand_selector(df.lazy(), ~(cs.first() | cs.last())) ('coly',)
Expand selector with respect to a standalone
Schemadict:>>> schema = { ... "id": pl.Int64, ... "desc": pl.String, ... "count": pl.UInt32, ... "value": pl.Float64, ... } >>> cs.expand_selector(schema, cs.string() | cs.float()) ('desc', 'value')
Allow for non-strict selection expressions (such as those including use of an
.exclude()constraint) to be expanded:>>> cs.expand_selector(schema, cs.numeric().exclude("id"), strict=False) ('count', 'value')
- polars.selectors.first(*, strict: bool = True) Selector[source]#
Select the first column in the current scope.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... )
Select the first column:
>>> df.select(cs.first()) shape: (2, 1) βββββββ β foo β β --- β β str β βββββββ‘ β x β β y β βββββββ
Select everything except for the first column:
>>> df.select(~cs.first()) shape: (2, 3) βββββββ¬ββββββ¬ββββββ β bar β baz β zap β β --- β --- β --- β β i64 β f64 β i64 β βββββββͺββββββͺββββββ‘ β 123 β 2.0 β 0 β β 456 β 5.5 β 1 β βββββββ΄ββββββ΄ββββββ
- polars.selectors.float() Selector[source]#
Select all float columns.
See also
integerSelect all integer columns.
numericSelect all numeric columns.
signed_integerSelect all signed integer columns.
unsigned_integerSelect all unsigned integer columns.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0.0, 1.0], ... }, ... schema_overrides={"baz": pl.Float32, "zap": pl.Float64}, ... )
Select all float columns:
>>> df.select(cs.float()) shape: (2, 2) βββββββ¬ββββββ β baz β zap β β --- β --- β β f32 β f64 β βββββββͺββββββ‘ β 2.0 β 0.0 β β 5.5 β 1.0 β βββββββ΄ββββββ
Select all columns except for those that are float:
>>> df.select(~cs.float()) shape: (2, 2) βββββββ¬ββββββ β foo β bar β β --- β --- β β str β i64 β βββββββͺββββββ‘ β x β 123 β β y β 456 β βββββββ΄ββββββ
- polars.selectors.integer() Selector[source]#
Select all integer columns.
See also
by_dtypeSelect columns by dtype.
floatSelect all float columns.
numericSelect all numeric columns.
signed_integerSelect all signed integer columns.
unsigned_integerSelect all unsigned integer columns.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... )
Select all integer columns:
>>> df.select(cs.integer()) shape: (2, 2) βββββββ¬ββββββ β bar β zap β β --- β --- β β i64 β i64 β βββββββͺββββββ‘ β 123 β 0 β β 456 β 1 β βββββββ΄ββββββ
Select all columns except for those that are integer :
>>> df.select(~cs.integer()) shape: (2, 2) βββββββ¬ββββββ β foo β baz β β --- β --- β β str β f64 β βββββββͺββββββ‘ β x β 2.0 β β y β 5.5 β βββββββ΄ββββββ
- polars.selectors.is_selector(obj: Any) bool[source]#
Indicate whether the given object/expression is a selector.
Examples
>>> from polars.selectors import is_selector >>> import polars.selectors as cs >>> is_selector(pl.col("colx")) False >>> is_selector(cs.first() | cs.last()) True
- polars.selectors.last(*, strict: bool = True) Selector[source]#
Select the last column in the current scope.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... )
Select the last column:
>>> df.select(cs.last()) shape: (2, 1) βββββββ β zap β β --- β β i64 β βββββββ‘ β 0 β β 1 β βββββββ
Select everything except for the last column:
>>> df.select(~cs.last()) shape: (2, 3) βββββββ¬ββββββ¬ββββββ β foo β bar β baz β β --- β --- β --- β β str β i64 β f64 β βββββββͺββββββͺββββββ‘ β x β 123 β 2.0 β β y β 456 β 5.5 β βββββββ΄ββββββ΄ββββββ
- polars.selectors.list(inner: None | Selector = None) Selector[source]#
Select all list columns.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
See also
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [["xx", "yy"], ["x"]], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... }, ... )
Select all list columns:
>>> df.select(cs.list()) shape: (2, 1) ββββββββββββββββ β foo β β --- β β list[str] β ββββββββββββββββ‘ β ["xx", "yy"] β β ["x"] β ββββββββββββββββ
Select all columns except for those that are list:
>>> df.select(~cs.list()) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
Select all list columns with a certain matching inner type:
>>> df.select(cs.list(cs.string())) shape: (2, 1) ββββββββββββββββ β foo β β --- β β list[str] β ββββββββββββββββ‘ β ["xx", "yy"] β β ["x"] β ββββββββββββββββ >>> df.select(cs.list(cs.integer())) shape: (0, 0) ββ ββ‘ ββ
- polars.selectors.matches(pattern: str) Selector[source]#
Select all columns that match the given regex pattern.
- Parameters:
- pattern
A valid regular expression pattern, compatible with the regex crate.
See also
containsSelect all columns that contain the given substring.
ends_withSelect all columns that end with the given substring(s).
starts_withSelect all columns that start with the given substring(s).
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... )
Match column names containing an βaβ, preceded by a character that is not βzβ:
>>> df.select(cs.matches("[^z]a")) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
Do not match column names ending in βRβ or βzβ (case-insensitively):
>>> df.select(~cs.matches(r"(?i)R|z$")) shape: (2, 2) βββββββ¬ββββββ β foo β zap β β --- β --- β β str β i64 β βββββββͺββββββ‘ β x β 0 β β y β 1 β βββββββ΄ββββββ
- polars.selectors.nested() Selector[source]#
Select all nested columns.
A nested column is a list, array or struct.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
See also
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [{"a": "xx", "b": "z"}, {"a": "x", "b": "y"}], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "wow": [[1, 2], [3]], ... }, ... )
Select all nested columns:
>>> df.select(cs.nested()) shape: (2, 2) ββββββββββββββ¬ββββββββββββ β foo β wow β β --- β --- β β struct[2] β list[i64] β ββββββββββββββͺββββββββββββ‘ β {"xx","z"} β [1, 2] β β {"x","y"} β [3] β ββββββββββββββ΄ββββββββββββ
Select all columns except for those that are nested:
>>> df.select(~cs.nested()) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
- polars.selectors.numeric() Selector[source]#
Select all numeric columns.
See also
by_dtypeSelect columns by dtype.
floatSelect all float columns.
integerSelect all integer columns.
signed_integerSelect all signed integer columns.
unsigned_integerSelect all unsigned integer columns.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 0], ... }, ... schema_overrides={"bar": pl.Int16, "baz": pl.Float32, "zap": pl.UInt8}, ... )
Match all numeric columns:
>>> df.select(cs.numeric()) shape: (2, 3) βββββββ¬ββββββ¬ββββββ β bar β baz β zap β β --- β --- β --- β β i16 β f32 β u8 β βββββββͺββββββͺββββββ‘ β 123 β 2.0 β 0 β β 456 β 5.5 β 0 β βββββββ΄ββββββ΄ββββββ
Match all columns except for those that are numeric:
>>> df.select(~cs.numeric()) shape: (2, 1) βββββββ β foo β β --- β β str β βββββββ‘ β x β β y β βββββββ
- polars.selectors.signed_integer() Selector[source]#
Select all signed integer columns.
See also
by_dtypeSelect columns by dtype.
floatSelect all float columns.
integerSelect all integer columns.
numericSelect all numeric columns.
unsigned_integerSelect all unsigned integer columns.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [-123, -456], ... "bar": [3456, 6789], ... "baz": [7654, 4321], ... "zap": ["ab", "cd"], ... }, ... schema_overrides={"bar": pl.UInt32, "baz": pl.UInt64}, ... )
Select all signed integer columns:
>>> df.select(cs.signed_integer()) shape: (2, 1) ββββββββ β foo β β --- β β i64 β ββββββββ‘ β -123 β β -456 β ββββββββ
>>> df.select(~cs.signed_integer()) shape: (2, 3) ββββββββ¬βββββββ¬ββββββ β bar β baz β zap β β --- β --- β --- β β u32 β u64 β str β ββββββββͺβββββββͺββββββ‘ β 3456 β 7654 β ab β β 6789 β 4321 β cd β ββββββββ΄βββββββ΄ββββββ
Select all integer columns (both signed and unsigned):
>>> df.select(cs.integer()) shape: (2, 3) ββββββββ¬βββββββ¬βββββββ β foo β bar β baz β β --- β --- β --- β β i64 β u32 β u64 β ββββββββͺβββββββͺβββββββ‘ β -123 β 3456 β 7654 β β -456 β 6789 β 4321 β ββββββββ΄βββββββ΄βββββββ
- polars.selectors.starts_with(*prefix: str) Selector[source]#
Select columns that start with the given substring(s).
- Parameters:
- prefix
Substring(s) that matching column names should start with.
See also
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [1.0, 2.0], ... "bar": [3.0, 4.0], ... "baz": [5, 6], ... "zap": [7, 8], ... } ... )
Match columns starting with a βbβ:
>>> df.select(cs.starts_with("b")) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β f64 β i64 β βββββββͺββββββ‘ β 3.0 β 5 β β 4.0 β 6 β βββββββ΄ββββββ
Match columns starting with either the letter βbβ or βzβ:
>>> df.select(cs.starts_with("b", "z")) shape: (2, 3) βββββββ¬ββββββ¬ββββββ β bar β baz β zap β β --- β --- β --- β β f64 β i64 β i64 β βββββββͺββββββͺββββββ‘ β 3.0 β 5 β 7 β β 4.0 β 6 β 8 β βββββββ΄ββββββ΄ββββββ
Match all columns except for those starting with βbβ:
>>> df.select(~cs.starts_with("b")) shape: (2, 2) βββββββ¬ββββββ β foo β zap β β --- β --- β β f64 β i64 β βββββββͺββββββ‘ β 1.0 β 7 β β 2.0 β 8 β βββββββ΄ββββββ
- polars.selectors.string(*, include_categorical: bool = False) Selector[source]#
Select all String (and, optionally, Categorical) string columns.
See also
binarySelect all binary columns.
by_dtypeSelect all columns matching the given dtype(s).
categoricalSelect all categorical columns.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "w": ["xx", "yy", "xx", "yy", "xx"], ... "x": [1, 2, 1, 4, -2], ... "y": [3.0, 4.5, 1.0, 2.5, -2.0], ... "z": ["a", "b", "a", "b", "b"], ... }, ... ).with_columns( ... z=pl.col("z").cast(pl.Categorical("lexical")), ... )
Group by all string columns, sum the numeric columns, then sort by the string cols:
>>> df.group_by(cs.string()).agg(cs.numeric().sum()).sort(by=cs.string()) shape: (2, 3) βββββββ¬ββββββ¬ββββββ β w β x β y β β --- β --- β --- β β str β i64 β f64 β βββββββͺββββββͺββββββ‘ β xx β 0 β 2.0 β β yy β 6 β 7.0 β βββββββ΄ββββββ΄ββββββ
Group by all string and categorical columns:
>>> df.group_by(cs.string(include_categorical=True)).agg(cs.numeric().sum()).sort( ... by=cs.string(include_categorical=True) ... ) shape: (3, 4) βββββββ¬ββββββ¬ββββββ¬βββββββ β w β z β x β y β β --- β --- β --- β --- β β str β cat β i64 β f64 β βββββββͺββββββͺββββββͺβββββββ‘ β xx β a β 2 β 4.0 β β xx β b β -2 β -2.0 β β yy β b β 6 β 7.0 β βββββββ΄ββββββ΄ββββββ΄βββββββ
- polars.selectors.struct() Selector[source]#
Select all struct columns.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
See also
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [{"a": "xx", "b": "z"}, {"a": "x", "b": "y"}], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... }, ... )
Select all struct columns:
>>> df.select(cs.struct()) shape: (2, 1) ββββββββββββββ β foo β β --- β β struct[2] β ββββββββββββββ‘ β {"xx","z"} β β {"x","y"} β ββββββββββββββ
Select all columns except for those that are struct:
>>> df.select(~cs.struct()) shape: (2, 2) βββββββ¬ββββββ β bar β baz β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 123 β 2.0 β β 456 β 5.5 β βββββββ΄ββββββ
- polars.selectors.temporal() Selector[source]#
Select all temporal columns.
See also
Examples
>>> from datetime import date, time >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(2021, 1, 1), date(2021, 1, 2)], ... "tm": [time(12, 0, 0), time(20, 30, 45)], ... "value": [1.2345, 2.3456], ... } ... )
Match all temporal columns:
>>> df.select(cs.temporal()) shape: (2, 2) ββββββββββββββ¬βββββββββββ β dt β tm β β --- β --- β β date β time β ββββββββββββββͺβββββββββββ‘ β 2021-01-01 β 12:00:00 β β 2021-01-02 β 20:30:45 β ββββββββββββββ΄βββββββββββ
Match all temporal columns except for time columns:
>>> df.select(cs.temporal() - cs.time()) shape: (2, 1) ββββββββββββββ β dt β β --- β β date β ββββββββββββββ‘ β 2021-01-01 β β 2021-01-02 β ββββββββββββββ
Match all columns except for temporal columns:
>>> df.select(~cs.temporal()) shape: (2, 1) ββββββββββ β value β β --- β β f64 β ββββββββββ‘ β 1.2345 β β 2.3456 β ββββββββββ
- polars.selectors.time() Selector[source]#
Select all time columns.
See also
Examples
>>> from datetime import date, datetime, time >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dtm": [datetime(2001, 5, 7, 10, 25), datetime(2031, 12, 31, 0, 30)], ... "dt": [date(1999, 12, 31), date(2024, 8, 9)], ... "tm": [time(0, 0, 0), time(23, 59, 59)], ... }, ... )
Select all time columns:
>>> df.select(cs.time()) shape: (2, 1) ββββββββββββ β tm β β --- β β time β ββββββββββββ‘ β 00:00:00 β β 23:59:59 β ββββββββββββ
Select all columns except for those that are times:
>>> df.select(~cs.time()) shape: (2, 2) βββββββββββββββββββββββ¬βββββββββββββ β dtm β dt β β --- β --- β β datetime[ΞΌs] β date β βββββββββββββββββββββββͺβββββββββββββ‘ β 2001-05-07 10:25:00 β 1999-12-31 β β 2031-12-31 00:30:00 β 2024-08-09 β βββββββββββββββββββββββ΄βββββββββββββ
- polars.selectors.unsigned_integer() Selector[source]#
Select all unsigned integer columns.
See also
by_dtypeSelect columns by dtype.
floatSelect all float columns.
integerSelect all integer columns.
numericSelect all numeric columns.
signed_integerSelect all signed integer columns.
Examples
>>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [-123, -456], ... "bar": [3456, 6789], ... "baz": [7654, 4321], ... "zap": ["ab", "cd"], ... }, ... schema_overrides={"bar": pl.UInt32, "baz": pl.UInt64}, ... )
Select all unsigned integer columns:
>>> df.select(cs.unsigned_integer()) shape: (2, 2) ββββββββ¬βββββββ β bar β baz β β --- β --- β β u32 β u64 β ββββββββͺβββββββ‘ β 3456 β 7654 β β 6789 β 4321 β ββββββββ΄βββββββ
Select all columns except for those that are unsigned integers:
>>> df.select(~cs.unsigned_integer()) shape: (2, 2) ββββββββ¬ββββββ β foo β zap β β --- β --- β β i64 β str β ββββββββͺββββββ‘ β -123 β ab β β -456 β cd β ββββββββ΄ββββββ
Select all integer columns (both signed and unsigned):
>>> df.select(cs.integer()) shape: (2, 3) ββββββββ¬βββββββ¬βββββββ β foo β bar β baz β β --- β --- β --- β β i64 β u32 β u64 β ββββββββͺβββββββͺβββββββ‘ β -123 β 3456 β 7654 β β -456 β 6789 β 4321 β ββββββββ΄βββββββ΄βββββββ