0

I'm working with Pandas MuliIndex. I use the from_product method. What I get is Numpy ndarray from the MultiIndex values property:

d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}
df1 = pd.DataFrame(data=d)
df2 = pd.DataFrame(data=d)


multi_index = pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2']).values

It returns a Ndarray of tuples: [(0, 0) (0, 1) (1, 0) (1, 1)]. The problem is that I want to keep only the tuples which both elements are equal. But because they're tuple I can't do vectorizations like this one:

equals = multi_index[multi_index[:, 0] == multi_index[:, 1]]

That would be possible if they were Lists instead of Tuples. Is there a way to filter by tuple's elements (could be a more complex condition than the one above)?

In case there isn't, what could I do? Cast every tuple to list? Maybe iterate over all the elements, but it would be too much slow in comparison with a vectorized solution.

Any kind of help would be very appreciated. Thanks in advance

4
  • What's the dtype and shape of that ndarray of tuples? Commented Sep 4, 2020 at 20:19
  • The dtype is object (strings) for both elements in the tuple. The shape is variable as the DataFrames are instantiated from uploaded files by users Commented Sep 4, 2020 at 20:36
  • np.stack(multi_index) produces a 2d array (n,2) of integers. Commented Sep 4, 2020 at 21:29
  • Cool! I'll give a try! Thank you! Commented Sep 4, 2020 at 22:29

1 Answer 1

1

Do not add .values at then end so that you can call get_level_values

multi_index = pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2'])
equals = multi_index[multi_index.get_level_values(0) == multi_index.get_level_values(1)]
equals
Out[487]: 
MultiIndex([(0, 0),
            (1, 1)],
           names=['idx1', 'idx2'])

For numpy array

idx = np.array(pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2']).tolist())
multi_index = pd.MultiIndex.from_product((df1.index, df2.index), names=['idx1', 'idx2']).values
equals = multi_index[idx[:, 0] == idx[:, 1]]
equals
Out[497]: array([(0, 0), (1, 1)], dtype=object)
Sign up to request clarification or add additional context in comments.

2 Comments

Cool! Thanks for answering, this solve my problem with MultIndex, but what about filtering by Numpy array of tuples?
@Genarito you need for loop , there , since tuple ~

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.