How to create multiple columns from multiple columns in a pandas data frame

Ask Question

Asked 10 years, 10 months ago

Modified 10 years, 10 months ago

Viewed 924 times

I am building a repository of clean, non-hard coded (= not using the data frame column names inside) function templates that enable creating 4 types of functions: 1 new column from 1 existing, many new columns from 1 existing, 1 new column from many and finally many-to-many.

The first 3 look like this and work:

In [97]:
data={'level1':[20,19,20,21,25,29,30,31,30,29,31],
      'level2': [10,10,20,20,20,10,10,20,20,10,10]}
index= pd.date_range('12/1/2014', periods=11)
frame=DataFrame(data, index=index)

In [98]:
def nonhardcoded_1to1(x):
    y=x+2
    return y
frame['test1to1']=frame['level1'].map(nonhardcoded_1to1)#works

def nonhardcoded_2to1(x,y):
    z=x+y
    return z
frame['test2to1']=frame[['level1','level2']].apply(lambda s: nonhardcoded_2to1(*s), axis=1)#works

def nonhardcoded_1to2(x):
    y=x+12
    z=x-12
    return y, z
frame['test1to2a'], frame['test1to2b'] = zip(*frame['level1'].map(nonhardcoded_1to2))#works

Now, for the many-to-many function I get errors. I am trying to stitch it together from the above '2to1' and '1-2' functions but they don't work together:

def nonhardcoded_2to2(x,y):
    z1=x+y
    z2=x-y
    return z1, z2
frame['test2to2a'], frame['test2to2b']=zip(*frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1))

ValueError: too many values to unpack

So I tried to dig into the function call:

test=frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1)

which returned this, so in theory this at least looks usable:

Out[104]:
level1  level2
2014-12-01  30  10
2014-12-02  29  9
2014-12-03  40  0
2014-12-04  41  1
2014-12-05  45  5
2014-12-06  39  19
2014-12-07  40  20
2014-12-08  51  11
2014-12-09  50  10
2014-12-10  39  19
2014-12-11  41  21

Then I tried:

test=zip(*frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1))
test

which returned a tuple sequence. For some reason it seems to take the headers of the result and turns it into pairs. Not sure why

[('l', 'l'), ('e', 'e'), ('v', 'v'), ('e', 'e'), ('l', 'l'), ('1', '2')]

How should I create and call this function so it works?

asked Feb 8, 2015 at 19:28

DISC-O

3321 gold badge3 silver badges13 bronze badges

1

It would work if you indexed the columns from test into your target columns so from this: test=frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1) you then do frame['test1to2a'], frame['test1to2b'] = test['level1'], test['level2']

EdChum
– EdChum

2015-02-08 19:34:57 +00:00
Commented Feb 8, 2015 at 19:34
As for why you get your result when you zip the transpose, it's because the iterable object that is returned are the column names rather than the column contents

EdChum
– EdChum

2015-02-08 19:36:49 +00:00
Commented Feb 8, 2015 at 19:36
Cool, that works. Now I want to to do this on millions of records, is there any way to skip building the interim data frame?

DISC-O
– DISC-O

2015-02-08 20:12:18 +00:00
Commented Feb 8, 2015 at 20:12
Not if you want multiple columns from it, it shouldn't be a big deal depending on your df size

EdChum
– EdChum

2015-02-08 20:46:16 +00:00
Commented Feb 8, 2015 at 20:46

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to create multiple columns from multiple columns in a pandas data frame

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked