2

I have two Pandas' data frames

df1
       items        view
0      A|B|C  02-10-2015
1        D|E  02-15-2015

df2
  item  num  val
0    A    1   10
1    B    3    2
2    C    8    9
3    D    9   13
4    E    2   22

I want to mere these frames to get

df 
  view       num1 val1 num2 val2 num3 val3
0 02-10-2015 1    10   3    2    8    9
1 02-15-2015 9    13   2    22   na   na

My current approach is to split df1.items using

df3 = pd.DataFrame(df1['items'].str.split('|').tolist())

which results in

    0    1    2
0   A    B    C
1   D    E None

and finally merges each individual column and concatenates them with the original df1

x = pd.merge(df3[[0]], df2, how='left', on='item')
y = pd.merge(df3[[1]], df2, how='left', on='item')
z = pd.merge(df3[[2]], df2, how='left', on='item')
pd.concat([df1, x.ix[:,1:],y.ix[:,1:],z.ix[:,1:]], axis=1)

The code works but it doesn't seem right to me, I would be happy if anyone can point out a proper way to achieve the same results.

Thank you in advance!

0

1 Answer 1

4

Note: str.split has a return_type argument:

In [11]: res = df1['items'].str.split("|", return_type='frame')

In [12]: res
Out[12]:
   0  1    2
0  A  B    C
1  D  E  NaN

In [13]: res.index = df1['view']

In [14]: res
Out[14]:
            0  1    2
view
02-10-2015  A  B    C
02-15-2015  D  E  NaN

I think a nicer, more general way, to do this is using stack/unstack:

In [15]: res = res.stack()

In [16]: res
Out[16]:
view
02-10-2015  0    A
            1    B
            2    C
02-15-2015  0    D
            1    E
dtype: object

Now you can either merge, or if you're lucky just switch out the index:

In [17]: df2 = df2.set_index('item') # could just drop this column

In [18]: df2.loc[res]  # reorder, may not be required
Out[18]:
      num  val
item
A       1   10
B       3    2
C       8    9
D       9   13
E       2   22

Now the magic:

In [21]: df2.index = r.index

In [22]: df2
Out[22]:
              num  val
view
02-10-2015 0    1   10
           1    3    2
           2    8    9
02-15-2015 0    9   13
           1    2   22

In [23]: df2.unstack()
Out[23]:
           num        val
             0  1   2   0   1   2
view
02-10-2015   1  3   8  10   2   9
02-15-2015   9  2 NaN  13  22 NaN

As desired (with a MultiIndex columns, which is what you want).


Note: If you have duplicates (A, B, Cs) you're going to need to merge (which is a little fiddlier, but could be cleaned up). Before [21]:

In [31]: df2.merge(res.to_frame(), left_index=True, right_on=0).unstack()
Out[31]:
           num        val          0
             0  1   2   0   1   2  0  1    2
view
02-10-2015   1  3   8  10   2   9  A  B    C
02-15-2015   9  2 NaN  13  22 NaN  D  E  NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.