1

Let say I have pandas DataFrame like this:

  NO_DOC | NO_SEQ | DESC
  A1     | 1      | A
  B1     | 2      | B
  C1     | 3      | C

I want to search row index from data above on a certain filter: I am able to find it using statement:

idx = data.index[data["NO_DOC"] == "A1" && data["NO_SEQ"] == 1]

but what I want to ask is how to make the "condition" is variable because I want to make it in a function.

Example:

def find(self, data, columns, value):
  return data.index[data[...] == value[...] && ....].tolist()

columns parameters is ["NO_DOC","NO_SEQ"] value is ["A1", 1]

the filter condition above is: data["NO_DOC"] == "A1" && data["NO_SEQ"] == 1

How do I make a function to find row index on pandas DataFrame with a certain filter condition?

2 Answers 2

2

We can use zip with a list comprehension to build a DataFrame from each Series of boolean values:

mask_df = pd.DataFrame([
    data[col] == val for col, val in zip(columns, values)
])

           0      1      2
NO_DOC  True  False  False
NO_SEQ  True  False  False

Then we can reduce this DataFrame of booleans with all to check which columns (indexes) have only True values (logical AND):

mask = mask_df.all()

0     True
1    False
2    False
dtype: bool

Note: logical OR can be achieved with any instead of all

Now we can use the mask to filter the indexes:

data.index[mask].tolist()

[0]

Together find can look something like:

def find(data, columns, values):
    # Create a DataFrame of boolean conditions
    # Take logical AND using DataFrame.all
    mask = pd.DataFrame([
        data[col] == val for col, val in zip(columns, values)
    ]).all()
    # Filter and convert to list
    return data.index[mask].tolist()

Or with np.logical_and and .reduce which behaves in with the same logic as the DataFrame but can be much faster as it does not need to maintain pandas objects:

def find(data, columns, values):
    mask = np.logical_and.reduce(
        [data[col] == val for col, val in zip(columns, values)]
    )
    return data.index[mask].tolist()

Some timing via timeit.

Setup:

import numpy as np
import pandas as pd


def find_with_df_mask(data, columns, values):
    mask = pd.DataFrame([
        data[col] == val for col, val in zip(columns, values)
    ]).all()
    return data.index[mask].tolist()


def find_with_np_mask(data, columns, values):
    mask = np.logical_and.reduce(
        [data[col] == val for col, val in zip(columns, values)]
    )
    return data.index[mask].tolist()


df = pd.concat([pd.DataFrame({
    'NO_DOC': ['A1', 'B1', 'C1'],
    'NO_SEQ': [1, 2, 3],
    'DESC': ['A', 'B', 'C']
})] * 1000, ignore_index=True)
cols = ["NO_DOC", "NO_SEQ"]
vals = ["A1", 1]

Timings:

%timeit find_with_df_mask(df, cols, vals)
21.9 ms ± 65.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit find_with_np_mask(df, cols, vals)
319 µs ± 253 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, it works. But unfortunately, with thousands of rows of data, it is kind of slow since it is like creating a new DataFrame for the mask. Is there any technics to fast retrieve row index that match the data pass to a function?
Numpy can be faster. See the updated solution (middle section) I've also added timings to show relative performance.
1

To write Pandas dataframe filter condition from parameters in a function:

import pandas as pd

raw_data = {'NO_DOC':['A1','B1','C1'],'NO_SEQ':[1,2,3],'DESC':['A','B','C']}
data = pd.DataFrame(raw_data)

columns = ["NO_DOC", "NO_SEQ"]
value = ["A1", 1]

Create masks and send them to functions instead of columns and values:

mask1 = data[columns[0]] == value[0] 
mask2 = data[columns[1]] == value[1]

def find(data, mask1,mask2):
    return data.index[mask1 & mask2].tolist()

Call your function as follows:

find(data, mask1, mask2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.