I cannot find how to reorder columns in a polars dataframe in the polars DataFrame docs.
5 Answers
Using the select method is the recommended way to sort columns in polars.
Example:
Input:
df
┌─────┬───────┬─────┐
│Col1 ┆ Col2 ┆Col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════╪═══════╪═════╡
│ a ┆ x ┆ p │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ b ┆ y ┆ q │
└─────┴───────┴─────┘
Output:
df.select(['Col3', 'Col2', 'Col1'])
or
df.select([pl.col('Col3'), pl.col('Col2'), pl.col('Col1)])
┌─────┬───────┬─────┐
│Col3 ┆ Col2 ┆Col1 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════╪═══════╪═════╡
│ p ┆ x ┆ a │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ q ┆ y ┆ b │
└─────┴───────┴─────┘
Note:
While df[['Col3', 'Col2', 'Col1']] gives the same result (version 0.14), it is recommended (link) that you use the select method instead.
We strongly recommend selecting data with expressions for almost all use cases. Square bracket indexing is perhaps useful when doing exploratory data analysis in a terminal or notebook when you just want a quick look at a subset of data.
For all other use cases we recommend using expressions because:
- expressions can be parallelized
- the expression approach can be used in lazy and eager mode while the indexing approach can only be used in eager mode
- in lazy mode the query optimizer can optimize expressions
1 Comment
df.select('Col3', 'Col2', 'Col1') (without the square brackets) also worksThat seems like a special case of projection to me.
df = pl.DataFrame({
"c": [1, 2],
"a": ["a", "b"],
"b": [True, False]
})
df.select(sorted(df.columns))
shape: (2, 3)
┌─────┬───────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ str ┆ bool ┆ i64 │
╞═════╪═══════╪═════╡
│ a ┆ true ┆ 1 │
│ b ┆ false ┆ 2 │
└─────┴───────┴─────┘
4 Comments
pipe for that.df.select(["a", "b"]), df.select(pl.col(["a", "b"])), and df.select(pl.col("a"), pl.col("b"))? Is there a preferred syntax?I wrote a plugin for this (on GitHub here), please try it out:
pip install polars-permute
Polars Permute Plugin
A Polars plugin for easily reordering DataFrame columns.
Supports column permutations like prepending, appending, shifting, and swapping.
Installation
python pip install polars-permute[polars]On older CPUs run:
python pip install polars-permute[polars-lts-cpu]Features
- Supports both string column names and Polars expressions
- Handles single or multiple columns
- Maintains relative ordering of moved columns
- Chain operations together
- Gracefully handles edge cases (non-existent columns, empty inputs)
Usage
The plugin adds a
permutenamespace to Polars DataFrames with methods for column reordering:
import polars as pl
import polars_permute
# Create a sample DataFrame df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
"c": [7, 8, 9],
"d": [10, 11, 12] })
# Move column 'd' to the start
df.permute.prepend("d")
# Move multiple columns to the end
df.permute.append(["a", "b"])
# Move columns to a specific position
df.permute.at(["b", "c"], index=0)
# Shift columns left/right
df.permute.shift("a", "b", steps=1, direction="right")
# Swap two columns
df.permute.swap("a", "d")
I tried to keep the naming distinct from the names of existing row-wise operations to minimise confusion.
Comments
I find the use pipe OrderedDict.fromkeys and pl.selectors to be the most versatile way of moving columns around without having to break the method chain. It allows you to move columns around using selectors while avoiding the DuplicateError from polars.
INPUT:
┌───┬─────┬───┬─────┐
│ b | z_a | a | z_b │
│---|-----|---|-----│
│ 4 | 2 | 1 | 3 │
OUTPUT:
from collections import OrderedDict
df.pipe(lambda df: df.select(
list(OrderedDict.fromkeys(
["a"]+
df.select(pl.selectors.starts_with("z_")).columns+
df.columns # Ensures all other columns are included at the end.
))))
┌───┬─────┬─────┬───┐
│ a | z_a | z_b | b │
│---|-----|-----|---│
│ 1 | 2 | 3 | 4 │
1 Comment
dict preserves key order. There's no longer any reason to use OrderedDict; it's only around for compatibility. (If you're using Polars, you're definitely using Python 3.7+: github.com/pola-rs/polars/blob/main/py-polars/…) docs.python.org/3/whatsnew/3.7.htmlTurns out it is the same as pandas:
df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA', 'VERSION', 'RELEASE_DATE', 'FLOW_SUMMARY', 'TESTSUITE', 'MODULE', 'BASECLASS', 'SUBCLASS', 'Empty', 'Color', 'BINNING', 'BYPASS', 'Status', 'Legend']]