Merging duplicate columns while reading CSV file

Question

Given a CSV file with duplicate column A, I need to read the file excluding the duplicate column -

 A       A       C
306     306     506
3238    3238    591
4159    4159    366
1847    1847    2898

Available alternative options include usecols, and names. However, in Pandas version 0.24.1 we have mangle_dupe_cols parameter too, which if set to False should merge duplicate columns as mentioned in the docs.

But, when I do so, I get ValueError-

pd.read_csv('file.csv', mangle_dupe_cols=False, engine='python').head()
ValueError: Setting mangle_dupe_cols=False is not supported yet

Pandas version used for this problem - 0.24.1

What are your views on this problem?

@user5173426 I have to merge while reading. I am well aware of the fact that there are various ways to remove duplicate after reading. I hope you get what I'm trying to ask. — meW
– meW, Commented Mar 4, 2019 at 7:54
Pandas tutorial says: "To prevent users from encountering this problem with duplicate data, a ValueError exception is raised if mangle_dupe_cols != True". It sounds like your problem does not have a solution. — DYZ
– DYZ, Commented Mar 4, 2019 at 7:57
Since I do not know pandas folks' intentions, I can only speculate that they advertised and implemented the option, but it turned out to be dangerously misused and had to be disabled. — DYZ
– DYZ, Commented Mar 4, 2019 at 8:00

Community · Accepted Answer · 2020-06-20 09:12:55Z

3

I check pandas github and found ENH: Support mangle_dupe_cols=False in pd.read_csv().

Unfortunately answer for comment is this comment:

What is the ETA on this issue?

when / if a community pull request happens

One possible solution is read file twice:

c = pd.read_csv('some.csv', header=None, nrows=1).iloc[0]
#or
#with open('some.csv', newline='') as f:
#  reader = csv.reader(f)
#  c = next(reader)

df = pd.read_csv('some.csv', header=None, skiprows=1)
df.columns = c

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Mar 4, 2019 at 8:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

meW Over a year ago

It seems like @DYZ is right. This argument is more of an advertisement! Thank you for the alternate suggestion but I just needed to understand the functioning of mangle_dupe_cols which still seems an open issue.

Collectives™ on Stack Overflow

Merging duplicate columns while reading CSV file

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related