Filter Numpy array of tuples with vectorized access

Question

I'm working with Pandas MuliIndex. I use the from_product method. What I get is Numpy ndarray from the MultiIndex values property:

d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}
df1 = pd.DataFrame(data=d)
df2 = pd.DataFrame(data=d)


multi_index = pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2']).values

It returns a Ndarray of tuples: [(0, 0) (0, 1) (1, 0) (1, 1)]. The problem is that I want to keep only the tuples which both elements are equal. But because they're tuple I can't do vectorizations like this one:

equals = multi_index[multi_index[:, 0] == multi_index[:, 1]]

That would be possible if they were Lists instead of Tuples. Is there a way to filter by tuple's elements (could be a more complex condition than the one above)?

In case there isn't, what could I do? Cast every tuple to list? Maybe iterate over all the elements, but it would be too much slow in comparison with a vectorized solution.

Any kind of help would be very appreciated. Thanks in advance

The dtype is object (strings) for both elements in the tuple. The shape is variable as the DataFrames are instantiated from uploaded files by users — Genarito
– Genarito, Commented Sep 4, 2020 at 20:36
np.stack(multi_index) produces a 2d array (n,2) of integers. — hpaulj
– hpaulj, Commented Sep 4, 2020 at 21:29

BENY · Accepted Answer · 2020-09-04 15:51:28Z

1

Do not add .values at then end so that you can call get_level_values

multi_index = pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2'])
equals = multi_index[multi_index.get_level_values(0) == multi_index.get_level_values(1)]
equals
Out[487]: 
MultiIndex([(0, 0),
            (1, 1)],
           names=['idx1', 'idx2'])

For numpy array

idx = np.array(pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2']).tolist())
multi_index = pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2']).values
equals = multi_index[idx[:, 0] == idx[:, 1]]
equals
Out[497]: array([(0, 0), (1, 1)], dtype=object)

edited Sep 4, 2020 at 15:51

answered Sep 4, 2020 at 15:46

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Genarito Over a year ago

Cool! Thanks for answering, this solve my problem with MultIndex, but what about filtering by Numpy array of tuples?

BENY Over a year ago

@Genarito you need for loop , there , since tuple ~

Collectives™ on Stack Overflow

Filter Numpy array of tuples with vectorized access

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related