polars.exceptions.DuplicateError: column with name 'name_ID' has more than one occurrence [closed]

Question

Closed. This question needs debugging details. It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.

Closed 9 days ago.

Improve this question

I have a dictionary of polars.DataFrames called data_dict. All dataframes inside the dict values are having an extra index column ''. I want to drop that column and set a new column named 'name_ID'

Code:

data_pl = pl.concat(data_dict.values()).with_row_index('name_ID')

Error:

polars.exceptions.DuplicateError: column with name 'name_ID' has more than one occurrence

My columns: ['','name_ID','col1',....,'colN']

Tried methods:

data_pl.to_pandas().set_index('name_ID')

Due to memory problems, if I try to use pandas.set_index() command I don't have enough GiB to allocate for that command.

Please help with some alternatives for how to set index column with polars.DataFrame.

Include all code (and data) necessary to reproduce the issue in the post ; see stackoverflow.com/help/minimal-reproducible-example — etrotta
– etrotta, Commented Nov 19 at 17:14
Is the problem solved? .drop('') will still leave you with a DuplicateError because you have an existing name_ID column and are trying to add another one using with_row_index('name_ID'). This is why a runnable example is required. The User Guide explains "no index": docs.pola.rs/user-guide/migration/pandas/… — jqurious
– jqurious, Commented Nov 20 at 14:21
The problem was somehow solved. Applying to_pandas() was performed taking the first column which happened to be name_ID, a bit of a workaround but there was no need to use with_row_index. — Tudi72
– Tudi72, Commented Nov 20 at 14:27

usdn · Accepted Answer · 2025-11-19 17:04:35Z

4

.with_row_index(): This method simply adds a new column with the name provided and contains a monotonically increasing, unsigned integer. Since you provide a column name to the method that already exists as a column in the dataframe, it fails...
More conceptually: Polars doesn't have 'special' index columns like you're probably familiar with from Pandas. Therefore, you don't need to specify a dedicated column as index at all. Instead, you can filter, sort etc. on any available column.
So all you really need to do is drop the column with the empty name: data_pl.drop('')

answered Nov 19 at 17:04

usdn

4023 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

polars.exceptions.DuplicateError: column with name 'name_ID' has more than one occurrence [closed]

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related