Python lambda on a pandas dataframe containing 2d Numpy arrays

Question

I need to convert several numpy arrays according to this rule. Get several arrays. Do elementwise comparison. If the given array at this position has value greater than 0.5 and is the greatest among all arrays at this index, then the value in this position of the corresponding output array is one. Otherwise - zero.

import pandas as pd
import numpy as np

def max_is_greater_than_half_1d(*args):
    df = pd.DataFrame(dict({'col_'+str(i+1): val for i, val in enumerate(args)}))
    max_val = df.apply(max, axis=1)
    df = df.apply(lambda x: (x > 0.5) & (max_val == x), axis=0).astype(int)
    return [np.array(df[col].values) for col in df.columns]

in_1=np.array([0.4, 0.7, 0.8, 0.3, 0.3])
in_2=np.array([0.9, 0.8, 0.6, 0.4, 0.4])
in_3=np.array([0.5, 0.5, 0.5, 0.2, 0.6])

out_1, out_2, out_3 = max_is_greater_than_half(in_1, in_2,in_3)
# out_1: [0, 0, 1, 0, 0]
# out_2: [1, 1, 0, 0, 0]
# out_3: [0, 0, 0, 0, 1]

This works. But how can I do the same operation on several 2d arrays?

example:

in_1=np.array([[0.4, 0.7], [0.8, 0.3]])
in_2=np.array([[0.9, 0.8], [0.6, 0.4])

out_1 = [[0, 0], [1, 0]] and out_2 = [[1, 1], [0, 0]]

In my case I have six 2000x2000 arrays, so an elementwise operation is going to be too slow. An operation on a whole array is preferable.

Arislan Makhmudov · Accepted Answer · 2021-09-22 11:56:24Z

1

It's almost the same code,

def max_is_greater_than_half_2d(*args):
    df = pd.DataFrame(dict({'col_'+str(i+1): val.flatten() for i, val in enumerate(args)}))
    max_val = df.apply(max, axis=1)
    df = df.apply(lambda x: (x > 0.5) & (max_val == x), axis=0).astype(int)
    return [np.array(df[col].values.reshape(-1,2)) for col in df.columns]

in_1=np.array([[0.4, 0.7], 
               [0.8, 0.3]])
in_2=np.array([[0.9, 0.8], 
               [0.6, 0.4]])
max_is_greater_than_half_2d(in_1, in_2)

or you could reuse the 1D function,

def max_is_greater_than_half_2d(*args):
    
    args = [val.flatten() for val in args]
    out  = max_is_greater_than_half_1d(*args)
    return [np.array(val.reshape(-1,2)) for val in out]

answered Sep 22, 2021 at 11:56

Arislan Makhmudov

1978 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sixtytrees · Accepted Answer · 2021-09-22 13:56:38Z

Here is a faster way to do this comparison:

def max_is_greater_than_half_2d_numpy(*args):
    max_arr = np.maximum.reduce(args)
    res = []
    for ar in args:
        res.append(np.where(np.logical_or(ar< max_arr, ar<0.5), 0.0, 1.0))
    return res

# 0.026 s to create two arrays below
rand_1=np.random.default_rng().random((2000, 2000),dtype=np.float32)
rand_2=np.random.default_rng().random((2000, 2000),dtype=np.float32)

# 0.05 s with np.maximum.reduce, np.where
r_out_v1 = max_is_greater_than_half_2d_numpy(rand_1, rand_2)  

# 45.0 s with pandas dataframe
r_out_v2 = max_is_greater_than_half_2d_pd_df(rand_1, rand_2)  # 45.0 s

Numpy np.maximum.reduce followed by np.where is about 1000 times faster than the pandas.dataframe approach, so, I will accept my own answer.

Collectives™ on Stack Overflow

Python lambda on a pandas dataframe containing 2d Numpy arrays

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related