Filter pandas DataFrame by value in column when column holds a list of list

Question

I have a question similar to this one but in my case, the column with the values I need to check for extracting the rows in the dataframe holds a list of list, not a numeric value.

My data looks like this:

import pandas as pd 

data = {
    'A' : [1, 2, 3, 4, 5],
    'B' : [[[1, 2], [3, 4]], [[0, 2], [5, 6]], [[1, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]]
}
dataF = pd.DataFrame(data)
print(dataF)

I need to extract the rows in the dataframe based on the value of the first element of the first list in each row for B. This value will always be 0 or 1.

Once this problem is solved I will have a dataframe looking like:

import pandas as pd 

data = {
    'A' : [1, 2, 3, 4, 5],
    'B' : [[[1, 2], [3, 4]], [[0, 2], [5, 6]], [[1, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]],
    'C' : [[[0, 2], [3, 4]], [[1, 2], [5, 6]], [[0, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]]
}
dataF = pd.DataFrame(data)
print(dataF)

From this dataframe I need to take all rows in which the first element of the first list in B or C is 1. This means rows 0, 1, 2, 4

EDIT based on the answer from WeNYoBen:

To extract all rows from a data frame in which the first element of the first list in B or C is 1, I am using the code below. However, this way to solve my problem requires to check for duplicate rows in extDF and to sort extDF by the values in one column. I guess there is a way to do this that does not require this two steps.

import pandas as pd 

data = {
    'A' : [1, 2, 3, 4, 5],
    'B' : [[[1, 2], [3, 4]], [[0, 2], [5, 6]], [[1, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]],
    'C' : [[[0, 2], [3, 4]], [[1, 2], [5, 6]], [[0, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]]
}
dataF = pd.DataFrame(data)


extDF = pd.DataFrame(columns=['A', 'B', 'C'])

for i in [1, 2]:
    tempDF = dataF[dataF.iloc[:,i].str[0].str[0].isin([1])].copy()
    extDF = extDF.append(tempDF)

extDF.drop_duplicates(keep='first', inplace=True, subset='A')
extDF.sort_values(by='A', inplace=True)
extDF.reset_index(drop=True, inplace=True)

print(extDF)

You need to filter the dataframe? what does that mean? filter how? And the same value as before? before what, where? — Akaisteph7
– Akaisteph7, Commented Jul 22, 2019 at 21:19

BENY · Accepted Answer · 2019-07-22 21:27:33Z

1

Base on what you described

Newdf=dataF[dataF.B.str[0].str[0].isin([0,1])].copy()

answered Jul 22, 2019 at 21:27

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

kbr85 Over a year ago

This is what I need for one column. Any idea about how to extend this to multiple columns like in the second part of the question?

Collectives™ on Stack Overflow

Filter pandas DataFrame by value in column when column holds a list of list

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related