How do I calculate a pandas column with multiple columns as arguments?

Question

I was using a wind speed calculation function from lon and lat components:

def wind_speed(u, v):
    return np.sqrt(u ** 2 + v ** 2)

and calling it to calculate a new pandas column from two existing ones:

df['wspeed'] = map(wind_speed, df['lonwind'], df['latwind'])

Since I changed from Python 2.7 to Python 3.5 the function is not working anymore. Could the change be the cause?

In a single argument (column) function:

def celsius(T):
    return round(T - 273, 1)

I am now using:

df['temp'] = df['t2m'].map(celsius)

And it works fine.

Could you help me?

But was the function map changed?

Hugo
– Hugo

2016-06-25 15:07:41 +00:00
Commented Jun 25, 2016 at 15:07 — Hugo
– Hugo, Commented Jun 25, 2016 at 15:07

jezrael · Accepted Answer · 2016-06-25 09:40:16Z

If want to use map, add list:

df = pd.DataFrame({'lonwind':[1,2,3],
                   'latwind':[4,5,6]})

print (df)
   latwind  lonwind
0        4        1
1        5        2
2        6        3

def wind_speed(u, v):
    return np.sqrt(u ** 2 + v ** 2)

df['wspeed'] = list(map(wind_speed, df['lonwind'], df['latwind']))

print (df)
   latwind  lonwind    wspeed
0        4        1  4.123106
1        5        2  5.385165
2        6        3  6.708204

Without list:

df['wspeed'] = (map(wind_speed, df['lonwind'], df['latwind']))
print (df)
   latwind  lonwind                              wspeed
0        4        1  <map object at 0x000000000AC42DA0>
1        5        2  <map object at 0x000000000AC42DA0>
2        6        3  <map object at 0x000000000AC42DA0>

map(function, iterable, ...)

Return an iterator that applies function to every item of iterable, yielding the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see itertools.starmap().

Another solution:

df['wspeed'] = (df['lonwind'] ** 2 + df['latwind'] ** 2) **0.5
print (df)
   latwind  lonwind    wspeed
0        4        1  4.123106
1        5        2  5.385165
2        6        3  6.708204

MaxU - stand with Ukraine · Accepted Answer · 2016-06-25 09:45:17Z

I would try to stick to existing numpy/scipy functions as they are extremely fast and optimized (numpy.hypot):

df['wspeed'] = np.hypot(df.latwind, df.lonwind)

Timing: against 300K rows DF:

In [47]: df = pd.concat([df] * 10**5, ignore_index=True)

In [48]: df.shape
Out[48]: (300000, 2)

In [49]: %paste
def wind_speed(u, v):
    return np.sqrt(u ** 2 + v ** 2)

## -- End pasted text --

In [50]: %timeit list(map(wind_speed, df['lonwind'], df['latwind']))
1 loop, best of 3: 922 ms per loop

In [51]: %timeit np.hypot(df.latwind, df.lonwind)
100 loops, best of 3: 4.08 ms per loop

Conclusion: vectorized approach was 230 times faster

If you have to write your own one, try to use vectorized math (working with vectors / columns instead of scalars):

def wind_speed(u, v):
    # using vectorized approach - column's math instead of scalar 
    return np.sqrt(u * u + v * v)

df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])

demo:

In [39]: df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])

In [40]: df
Out[40]:
   latwind  lonwind    wspeed
0        4        1  4.123106
1        5        2  5.385165
2        6        3  6.708204

same vectorized approach with celsius() function:

def celsius(T):
    # using vectorized function: np.round()
    return np.round(T - 273, 1)

Collectives™ on Stack Overflow

How do I calculate a pandas column with multiple columns as arguments?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related