0

I am building a repository of clean, non-hard coded (= not using the data frame column names inside) function templates that enable creating 4 types of functions: 1 new column from 1 existing, many new columns from 1 existing, 1 new column from many and finally many-to-many.

The first 3 look like this and work:

In [97]:
data={'level1':[20,19,20,21,25,29,30,31,30,29,31],
      'level2': [10,10,20,20,20,10,10,20,20,10,10]}
index= pd.date_range('12/1/2014', periods=11)
frame=DataFrame(data, index=index)

In [98]:
def nonhardcoded_1to1(x):
    y=x+2
    return y
frame['test1to1']=frame['level1'].map(nonhardcoded_1to1)#works

def nonhardcoded_2to1(x,y):
    z=x+y
    return z
frame['test2to1']=frame[['level1','level2']].apply(lambda s: nonhardcoded_2to1(*s), axis=1)#works

def nonhardcoded_1to2(x):
    y=x+12
    z=x-12
    return y, z
frame['test1to2a'], frame['test1to2b'] = zip(*frame['level1'].map(nonhardcoded_1to2))#works

Now, for the many-to-many function I get errors. I am trying to stitch it together from the above '2to1' and '1-2' functions but they don't work together:

def nonhardcoded_2to2(x,y):
    z1=x+y
    z2=x-y
    return z1, z2
frame['test2to2a'], frame['test2to2b']=zip(*frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1))

ValueError: too many values to unpack

So I tried to dig into the function call:

test=frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1)

which returned this, so in theory this at least looks usable:

Out[104]:
level1  level2
2014-12-01  30  10
2014-12-02  29  9
2014-12-03  40  0
2014-12-04  41  1
2014-12-05  45  5
2014-12-06  39  19
2014-12-07  40  20
2014-12-08  51  11
2014-12-09  50  10
2014-12-10  39  19
2014-12-11  41  21

Then I tried:

test=zip(*frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1))
test

which returned a tuple sequence. For some reason it seems to take the headers of the result and turns it into pairs. Not sure why

[('l', 'l'), ('e', 'e'), ('v', 'v'), ('e', 'e'), ('l', 'l'), ('1', '2')]

How should I create and call this function so it works?

4
  • 1
    It would work if you indexed the columns from test into your target columns so from this: test=frame[['level1','level2']].apply(lambda s: nonhardcoded_2to2(*s), axis=1) you then do frame['test1to2a'], frame['test1to2b'] = test['level1'], test['level2'] Commented Feb 8, 2015 at 19:34
  • As for why you get your result when you zip the transpose, it's because the iterable object that is returned are the column names rather than the column contents Commented Feb 8, 2015 at 19:36
  • Cool, that works. Now I want to to do this on millions of records, is there any way to skip building the interim data frame? Commented Feb 8, 2015 at 20:12
  • Not if you want multiple columns from it, it shouldn't be a big deal depending on your df size Commented Feb 8, 2015 at 20:46

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.