3

I need to drop the first column in a polars DataFrame.

I tried:

result = df.select([col for idx, col in enumerate(df.columns) if idx != 0])

But it looks long and clumsy for such a simple task?

I also tried:

df.select(pl.exclude(pl.nth(0)))

but that errored out

# TypeError: invalid input for `exclude`

# Expected one or more `str` or `DataType`; found <Expr ['cs.nth(1, require_all=true)'] at 0x21A964B6350> instead.
5
  • 1
    you could create minimal working code so we could use it for tests. Commented Sep 16 at 19:01
  • how about df.select(pl.exclude(df.columns[1])) Commented Sep 16 at 19:05
  • I'm not sure why @jqurious removed the link to the website I took the first solution from? Commented Sep 17 at 3:58
  • 1
    The code examples are of questionable/poor quality. I clicked on a "substring" page that contains "AI hallucinated" method names that do not exist in Polars. Others contain extremely unidiomatic code. The site is full of ads and asks you to subscribe/pay to remove ads. It doesn't seem like something we should be directing people towards. Commented Sep 17 at 7:37
  • @jqurious I completely agree. Thank you. Commented Sep 17 at 10:27

3 Answers 3

5

You can use selectors set notation.

from polars import selectors as cs

df.select(cs.all() - cs.by_index(1))

or just

df.select(cs.exclude(cs.by_index(1)))

or even more succinctly,

df.select(~cs.by_index(0))
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. I ended up with df.select(~cs.by_index(0)) using the set operation for selectors
I didn't even think about it that way (obviously since it's the most concise). nice find.
you can put it in your answer if you want
@robertspierre after reading this answer (and checking documentation for selectors) I was testing the same idea df.select(~cs.by_index(0)) :) I like it.
3

Python is zero-indexed so when you say “first” do you mean index == 0 or index == 1? In any case you can use a selector to drop a column:

import polars as pl
import polars.selectors as cs

df = pl.DataFrame(
    {
        "a": [1,2,3],
        "b": [4,5,6],
        "c": [7,8,9],
    }
)

df.drop(cs.by_index(0))  # drops 'a'; 1 would drop 'b'
shape: (3, 2)
┌─────┬─────┐
│ b   ┆ c   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4   ┆ 7   │
│ 5   ┆ 8   │
│ 6   ┆ 9   │
└─────┴─────┘

3 Comments

I think the first column is quite unambiguous? I don't think there can be a zero-th column? I mean 0 and 1 are cardinal number, first is ordinal
You said first but wrote idx != 1 and pl.nth(1), so it wasn't clear what you wanted. The first column is index 0.
oh you are right sorry
2

It seems exclude() needs column's name and you can use df.columns[1] to get its name.

index = 1

result = df.select(pl.exclude( df.columns[index] ))

result = df.drop( df.columns[index] )

It seems works also:

result = df.drop( pl.nth(index) )

Minimal working code for tests:

It allows to run ie. main.py 2 to remove 3rd column.

import polars as pl
import sys

# Creating a new Polars DataFrame
technologies = {
    'Courses': ["Spark", "Pandas", "Hadoop", "Python", "Pandas", "Spark"],
    'Fees': [22000, 26000, 25000, 20000, 26000, 22000],
    'Duration': ['30days', '60days', '50days', '40days', '60days', '30days'],
    'Discount': [1000, 2000, 1500, 1200, 2000, 1000]
}

df = pl.DataFrame(technologies)
print(df)

if len(sys.argv) > 1:
    index = int(sys.argv[1])
else:
    index = 1

print('index:', index)

name = df.columns[index]
print('name:', name)

# --- version 1 ---

print('--- version 1 ---')
result = df.select([col for idx, col in enumerate(df.columns) if idx != index])
print(result)

#result = df.select(pl.exclude(pl.nth(1)))

# --- version 2 ---

print('--- version 2 ---')
#result = df.select(pl.exclude(df.columns[index]))
name = df.columns[index]
result = df.select(pl.exclude(name))
print(result)

# --- version 3 ---

print('--- version 3 ---')
#result = df.drop(df.columns[index])
name = df.columns[index]
result = df.drop(name)
print(result)

# --- version 4 ---

print('--- version 4 ---')
result = df.drop(pl.nth(index))
print(result)

# ---

#print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.