1

I have a dataframe that looks like this:

  |    A     |     B     |     C     |     D
0 |  a |  b  |  c  |  d  |  e  |  f  |  g  |  h
1 |  a |  b  |  c  |  d  |  e  |  f  | NaN | NaN
2 |  a |  b  | NaN | NaN |  e  |  f  | NaN | NaN
3 |  a |  b  | NaN | NaN | NaN | NaN | NaN | NaN

and I want to change it to this:

    OBJ   VAL1   VAL2
0    A     a      b
1    A     a      b
2    A     a      b
3    A     a      b
4    A     a      b
5    A     a      b
6    B     c      d
7    B     c      d
8    C     e      f
9    C     e      f
10   C     e      f
11   D     g      h

so the multindex will convert to a column as well.

any help please?

and is there any good tutorial that explains exactly those kind of stuff so I will be able to do it and not just try and error?

Thank you

EDIT: My first originally dataframe looks like this:

      A       B       C       D
0   (a,b)   (c,d)   (e,f)   (g,h)
1   (a,b)   (c,d)   (e,f)    NaN
2   (a,b)    NaN    (e,f)    NaN
3   (a,b)    NaN     NaN     NaN

so in each cell there is a tuple.

3
  • What does your original DataFrame ACTUALLY look like? Exactly? Commented Dec 15, 2016 at 5:57
  • @JohnZwinck I edited and added the originally dataframe. Commented Dec 15, 2016 at 6:36
  • so you want to change orginal dataframe to output you want? Commented Dec 15, 2016 at 7:06

1 Answer 1

2

You can use DataFrame.from_records first, then double reset_index and if necessary sort_values for sorting all columns:

df = pd.DataFrame({"A": [('a','b'),('a','b'),('a','b'),('a','b')], 
                   'B': [('c','d'),('c','d'), np.nan,np.nan], 
                   'C':[('e','f'),('e','f'),('e','f'),np.nan],
                   'D':[('g','h'),np.nan,np.nan,np.nan]})
print (df)
        A       B       C       D
0  (a, b)  (c, d)  (e, f)  (g, h)
1  (a, b)  (c, d)  (e, f)     NaN
2  (a, b)     NaN  (e, f)     NaN
3  (a, b)     NaN     NaN     NaN

stacked = df.stack()
df1 = pd.DataFrame.from_records(stacked.tolist(), index = stacked.index)
        .reset_index(level=0, drop=True)
        .reset_index()
        .sort_values(['index',0,1])
df1.columns = ['OBJ','VAL1','VAL2']
print (df1)
  OBJ VAL1 VAL2
0   A    a    b
4   A    a    b
7   A    a    b
9   A    a    b
1   B    c    d
5   B    c    d
2   C    e    f
6   C    e    f
8   C    e    f
3   D    g    h

If your DataFrame is with MultiIndex in columns, need stack first:

stacked = df.stack()
df1 = pd.DataFrame.from_records(stacked.tolist(), index = stacked.index) \
        .unstack(1) \
        .swaplevel(0, 1, 1) \
        .sort_index(axis=1) \
        .replace({None:np.nan})

print (df1)
   A       B         C         D     
   0  1    0    1    0    1    0    1
0  a  b    c    d    e    f    g    h
1  a  b    c    d    e    f  NaN  NaN
2  a  b  NaN  NaN    e    f  NaN  NaN
3  a  b  NaN  NaN  NaN  NaN  NaN  NaN

df2 = df1.stack(0)
         .reset_index(level=0, drop=True)
         .reset_index()
         .sort_values(['index',0,1])

df2.columns = ['OBJ','VAL1','VAL2']
print (df2)
  OBJ VAL1 VAL2
0   A    a    b
4   A    a    b
7   A    a    b
9   A    a    b
1   B    c    d
5   B    c    d
2   C    e    f
6   C    e    f
8   C    e    f
3   D    g    h
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.