3

I am using a pandas.Series with a MultiIndex for a bidirectional weighted lookup. I thought it should be easy to also find the corresponding other levels for a given level using the MultiIndex, but I cannot find a simple function other that does something like the following:

>>> index=pandas.MultiIndex.from_tuples(
...                  [(0, 0),(1,2),(3,4),(5,6),(5,7),(8,0),(9,0)],
...                  names=["concept", "word"])
>>> other(index, "word", 0)
{0, 8, 9}
>>> other(index, "concept", 3)
{4}
>>> other(index, "word", 6)
{5}

I'm happy to specify level numbers instead of level names and to get any iterable out, not necessarily a set. I only have a 2-level multi-index, so I don't care about how to generalize to a higher-level multi-indices, or even whether it does generalize.

I would be slightly unhappy if this involves iterating over all entries in the MultiIndex and comparing them, because I thought indices are somewhat like multi-key hash tables.

2 Answers 2

4

Approach 1:

You could build up a custom function using a vectorized approach as shown:

def other(index, slicing, value):
    arr = np.column_stack(index.values.tolist())
    return (np.delete(arr, slicing, axis=0)[0][arr[slicing]==value])

Usage:

other(index, slicing=index.names.index('word'), value=0)
# array([0, 8, 9])

Timings:

%timeit other(index, slicing=index.names.index('word'), value=0)
10000 loops, best of 3: 43.9 µs per loop

Approach 2:

If you want to use an inbuilt method which gives you the result by mere plugging in values to the respective args, you could opt for get_loc_level which gives you the integer location slice corresponding to a label, like so:

Demo:

index.get_loc_level(key=3, level='concept')[1].ravel()
# array([4], dtype=int64)

index.get_loc_level(key=0, level='word')[1].ravel()
# array([0, 8, 9], dtype=int64)

index.get_loc_level(key=6, level='word')[1].ravel()
# array([5], dtype=int64)

Timings:

%timeit index.get_loc_level(key=0, level='word')[1].ravel()
10000 loops, best of 3: 129 µs per loop

So, you get a 3x boost using a custom function rather than implementing using the built-in methods for the 2-level multi-index DF given.

Sign up to request clarification or add additional context in comments.

Comments

1

How about this:

>>> index.get_level_values('concept').values[index.get_level_values('word').values == 0]
array([0, 8, 9])

>>> index.get_level_values('concept').values[index.get_level_values('word').values == 6]
array([5])

>>> index.get_level_values('word').values[index.get_level_values('concept').values == 3]
array([4])

Note that you can easily transform a numpy array to a set:

>>> set(np.array([1, 2, 3]))
{1, 2, 3}

and wrapping all of the above into some function other shouldn't be very difficult.

1 Comment

That would mean iterating over all entries (even though it's in a numpy == operation), which I was hoping some Index magic would allow me to avoid, but at least it's concise, readable and the iteration is in numpy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.