2

In the example below, the first apply works. The second throws "TypeError: ("Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'", u'occurred at index 0')"

df = pd.DataFrame({'lag':[ 3, 5, 3, 4, 2, 3, 2, 3, 4, 3, 2, 2, 2, 3],
                   'A':[10,20,30,40,20,30,40,10,20,30,15,60,20,15],
                   'B':[11,21,31,41,21,31,41,11,21,31,15,61,21,25]})
df['C'] = df.apply(lambda x: df['A'].shift(x['lag'])[x.name], axis=1)
print df
df['D'] = df.apply(lambda x: df['B'].shift(x['lag'])[x.name], axis=1)
print df

Please tell me why this happens and how to fix it. Thanks,

(Note: I do not have enough "points" to post a comment in Variable shift in Pandas)

1 Answer 1

2

This is actually a tricky thing going on. I'll try to be succinct.

When you are using apply with axis=1 you are iterating row by row. For each row, pandas handles it as a pd.Series. After your initial assignment, you put NaN values in the df When that row is accessed, the entire row gets interpreted as float


work around # 1
Ensure lag value is int

df['D'] = df.apply(lambda x: df['B'].shift(int(x['lag']))[x.name], axis=1)

work around # 2
Do assignments at same time

df = df.assign(
    C=df.apply(lambda x: df['A'].shift(x['lag'])[x.name], axis=1),
    D=df.apply(lambda x: df['B'].shift(int(x['lag']))[x.name], axis=1)
)

better solution
However, I'd use numpy to help with this

Those lags are just the current position values less the lag value

l = (np.arange(len(df)) - df.lag.values)

then

df['C'] = np.where(l >= 0, df.A.values[l], np.nan)
df['D'] = np.where(l >= 0, df.B.values[l], np.nan)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! Works like a champ!
@Bill I've got better in an update coming in a few minutes
That's crafty. I assume this works a lot faster since it operates "column-wise" instead of 'row-wise')?
@Bill I added that answer to the question you referenced. Go there and look at the time differences. The speed up comes from not iterating over every row and instead using a vectorized approach of slicing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.