2

I am looking for the simplest solution to create a Python data frame from a CSV file that has duplicate index names (s1 and s2 in the example below).

Here is how the CSV file looks like:

       var1   var2    var3
unit x    8      4      12
temp y   -1     -4      -3
time     
s1        9     12      11
s2       12     15       7
month    
s1        1      3      12 
s2        2      4       6

Python data frame should be as follows:

        var1   var2    var3
unit x     8      4      12
temp y    -1     -4      -3
time s1    9     12      11
time s2   12     15       7
month s1   1      3      12
month s2   2      4       6

What's the best way to perform this operation?

2 Answers 2

3

Use:

#convert index to Series
s = df.index.to_series()
#identify duplicated values
m = s.duplicated(keep=False)
#replace dupes by NaNs and then by forward filling
df.index = np.where(m, s.mask(m).ffill() + ' ' + s.index, s)
#remove only NaNs rows
df = df.dropna(how='all')
print (df)
          var1  var2  var3
unit x     8.0   4.0  12.0
temp y    -1.0  -4.0  -3.0
time s1    9.0  12.0  11.0
time s2   12.0  15.0   7.0
month s1   1.0   3.0  12.0
month s2   2.0   4.0   6.0
Sign up to request clarification or add additional context in comments.

2 Comments

This line is giving the warning below: df.index = np.where(m, s.mask(m).ffill() + ' ' + s.index, s) FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union() Is there a way to rewrite it using union() or '|'?
@arqchicago how working df.index = np.where(m, s.mask(m).ffill() + ' ' + s, s)?
0

considered dataframe

        C   D   E
A   B           
a   4   7.0 1.0 5.0
5   3.0 4.0 5.5
b   5   8.0 3.0 3.0
c   4   9.0 5.0 6.0
f   4   3.0 0.0 4.0

you can use df.reset_index drop is False which can make number of columns based on index levels then you can assign to main index once it is converted

#converting index to columns
df = df1.reset_index()
# Assigning multilevel index columns to main index
df.index = df[df.columns[0]].astype(str)+' '+df[df.columns[1]].astype(str)
# dropping the indexed columns
df = df.drop(df.columns[[0,1]],axis=1)

Out:

    C   D   E
a 4 7.0 1.0 5.0
a 5 3.0 4.0 5.5
b 5 8.0 3.0 3.0
c 4 9.0 5.0 6.0
f 4 3.0 0.0 4.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.