30

I cannot find how to reorder columns in a polars dataframe in the polars DataFrame docs.

5 Answers 5

38

Using the select method is the recommended way to sort columns in polars.

Example:

Input:

df
┌─────┬───────┬─────┐
│Col1 ┆ Col2  ┆Col3 │
│ --- ┆ ---   ┆ --- │
│ str ┆ str   ┆ str │
╞═════╪═══════╪═════╡
│ a   ┆ x     ┆ p   │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ b   ┆ y     ┆ q   │
└─────┴───────┴─────┘

Output:

df.select(['Col3', 'Col2', 'Col1'])
or
df.select([pl.col('Col3'), pl.col('Col2'), pl.col('Col1)])

┌─────┬───────┬─────┐
│Col3 ┆ Col2  ┆Col1 │
│ --- ┆ ---   ┆ --- │
│ str ┆ str   ┆ str │
╞═════╪═══════╪═════╡
│ p   ┆ x     ┆ a   │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ q   ┆ y     ┆ b   │
└─────┴───────┴─────┘

Note: While df[['Col3', 'Col2', 'Col1']] gives the same result (version 0.14), it is recommended (link) that you use the select method instead.

We strongly recommend selecting data with expressions for almost all use cases. Square bracket indexing is perhaps useful when doing exploratory data analysis in a terminal or notebook when you just want a quick look at a subset of data.

For all other use cases we recommend using expressions because:

  1. expressions can be parallelized
  2. the expression approach can be used in lazy and eager mode while the indexing approach can only be used in eager mode
  3. in lazy mode the query optimizer can optimize expressions
Sign up to request clarification or add additional context in comments.

1 Comment

df.select('Col3', 'Col2', 'Col1') (without the square brackets) also works
20

That seems like a special case of projection to me.

df = pl.DataFrame({
    "c": [1, 2],
    "a": ["a", "b"],
    "b": [True, False]
})

df.select(sorted(df.columns))
shape: (2, 3)
┌─────┬───────┬─────┐
│ a   ┆ b     ┆ c   │
│ --- ┆ ---   ┆ --- │
│ str ┆ bool  ┆ i64 │
╞═════╪═══════╪═════╡
│ a   ┆ true  ┆ 1   │
│ b   ┆ false ┆ 2   │
└─────┴───────┴─────┘

4 Comments

Is there a way to do this without referencing the data frame explicitly? I.e. something that would work with method chaining?
You can use pipe for that.
What's the difference between df.select(["a", "b"]), df.select(pl.col(["a", "b"])), and df.select(pl.col("a"), pl.col("b"))? Is there a preferred syntax?
While the two first ones provide the same result, the third one allows providing expressions like: df.select(pl.col("a"), pl.col("b") + 1). See documentation
3

I wrote a plugin for this (on GitHub here), please try it out:

pip install polars-permute

Polars Permute Plugin

A Polars plugin for easily reordering DataFrame columns.

Supports column permutations like prepending, appending, shifting, and swapping.

Installation

python pip install polars-permute[polars]

On older CPUs run:

python pip install polars-permute[polars-lts-cpu]

Features

  • Supports both string column names and Polars expressions
  • Handles single or multiple columns
  • Maintains relative ordering of moved columns
  • Chain operations together
  • Gracefully handles edge cases (non-existent columns, empty inputs)

Usage

The plugin adds a permute namespace to Polars DataFrames with methods for column reordering:

import polars as pl
import polars_permute

# Create a sample DataFrame df = pl.DataFrame({  
    "a": [1, 2, 3],  
    "b": [4, 5, 6],
    "c": [7, 8, 9],
    "d": [10, 11, 12] })

# Move column 'd' to the start
df.permute.prepend("d")

# Move multiple columns to the end
df.permute.append(["a", "b"])

# Move columns to a specific position
df.permute.at(["b", "c"], index=0)

# Shift columns left/right
df.permute.shift("a", "b", steps=1, direction="right")

# Swap two columns
df.permute.swap("a", "d")

I tried to keep the naming distinct from the names of existing row-wise operations to minimise confusion.

Comments

0

I find the use pipe OrderedDict.fromkeys and pl.selectors to be the most versatile way of moving columns around without having to break the method chain. It allows you to move columns around using selectors while avoiding the DuplicateError from polars.

INPUT:

┌───┬─────┬───┬─────┐
│ b | z_a | a | z_b │
│---|-----|---|-----│
│ 4 | 2   | 1 | 3   │

OUTPUT:

from collections import OrderedDict
df.pipe(lambda df: df.select(
        list(OrderedDict.fromkeys(
            ["a"]+
            df.select(pl.selectors.starts_with("z_")).columns+ 
            df.columns  # Ensures all other columns are included at the end.
        ))))

┌───┬─────┬─────┬───┐
│ a | z_a | z_b | b │
│---|-----|-----|---│
│ 1 | 2   | 3   | 4 │

1 Comment

Since Python 3.7, a regular dict preserves key order. There's no longer any reason to use OrderedDict; it's only around for compatibility. (If you're using Polars, you're definitely using Python 3.7+: github.com/pola-rs/polars/blob/main/py-polars/…) docs.python.org/3/whatsnew/3.7.html
-4

Turns out it is the same as pandas:

df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA', 'VERSION', 'RELEASE_DATE', 'FLOW_SUMMARY', 'TESTSUITE', 'MODULE', 'BASECLASS', 'SUBCLASS', 'Empty', 'Color', 'BINNING', 'BYPASS', 'Status', 'Legend']]

1 Comment

As mentioned above, this method is disfavored as compared to using .select()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.