6

Given a CSV file with duplicate column A, I need to read the file excluding the duplicate column -

 A       A       C
306     306     506
3238    3238    591
4159    4159    366
1847    1847    2898

Available alternative options include usecols, and names. However, in Pandas version 0.24.1 we have mangle_dupe_cols parameter too, which if set to False should merge duplicate columns as mentioned in the docs.

But, when I do so, I get ValueError-

pd.read_csv('file.csv', mangle_dupe_cols=False, engine='python').head()
ValueError: Setting mangle_dupe_cols=False is not supported yet

Pandas version used for this problem - 0.24.1

What are your views on this problem?

10
  • 1
    df = df.loc[:,~df.columns.duplicated()] Commented Mar 4, 2019 at 7:53
  • 2
    @user5173426 I have to merge while reading. I am well aware of the fact that there are various ways to remove duplicate after reading. I hope you get what I'm trying to ask. Commented Mar 4, 2019 at 7:54
  • You mean A A.1 C? Commented Mar 4, 2019 at 7:56
  • 2
    Pandas tutorial says: "To prevent users from encountering this problem with duplicate data, a ValueError exception is raised if mangle_dupe_cols != True". It sounds like your problem does not have a solution. Commented Mar 4, 2019 at 7:57
  • 3
    Since I do not know pandas folks' intentions, I can only speculate that they advertised and implemented the option, but it turned out to be dangerously misused and had to be disabled. Commented Mar 4, 2019 at 8:00

1 Answer 1

3

I check pandas github and found ENH: Support mangle_dupe_cols=False in pd.read_csv().

Unfortunately answer for comment is this comment:

What is the ETA on this issue?

when / if a community pull request happens

One possible solution is read file twice:

c = pd.read_csv('some.csv', header=None, nrows=1).iloc[0]
#or
#with open('some.csv', newline='') as f:
#  reader = csv.reader(f)
#  c = next(reader)

df = pd.read_csv('some.csv', header=None, skiprows=1)
df.columns = c
Sign up to request clarification or add additional context in comments.

1 Comment

It seems like @DYZ is right. This argument is more of an advertisement! Thank you for the alternate suggestion but I just needed to understand the functioning of mangle_dupe_cols which still seems an open issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.