I have a question similar to this one but in my case, the column with the values I need to check for extracting the rows in the dataframe holds a list of list, not a numeric value.
My data looks like this:
import pandas as pd
data = {
'A' : [1, 2, 3, 4, 5],
'B' : [[[1, 2], [3, 4]], [[0, 2], [5, 6]], [[1, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]]
}
dataF = pd.DataFrame(data)
print(dataF)
I need to extract the rows in the dataframe based on the value of the first element of the first list in each row for B. This value will always be 0 or 1.
Once this problem is solved I will have a dataframe looking like:
import pandas as pd
data = {
'A' : [1, 2, 3, 4, 5],
'B' : [[[1, 2], [3, 4]], [[0, 2], [5, 6]], [[1, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]],
'C' : [[[0, 2], [3, 4]], [[1, 2], [5, 6]], [[0, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]]
}
dataF = pd.DataFrame(data)
print(dataF)
From this dataframe I need to take all rows in which the first element of the first list in B or C is 1. This means rows 0, 1, 2, 4
EDIT based on the answer from WeNYoBen:
To extract all rows from a data frame in which the first element of the first list in B or C is 1, I am using the code below. However, this way to solve my problem requires to check for duplicate rows in extDF and to sort extDF by the values in one column. I guess there is a way to do this that does not require this two steps.
import pandas as pd
data = {
'A' : [1, 2, 3, 4, 5],
'B' : [[[1, 2], [3, 4]], [[0, 2], [5, 6]], [[1, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]],
'C' : [[[0, 2], [3, 4]], [[1, 2], [5, 6]], [[0, 3], [7, 8]], [[0, 4], [9, 10]], [[1, 5], [11, 12]]]
}
dataF = pd.DataFrame(data)
extDF = pd.DataFrame(columns=['A', 'B', 'C'])
for i in [1, 2]:
tempDF = dataF[dataF.iloc[:,i].str[0].str[0].isin([1])].copy()
extDF = extDF.append(tempDF)
extDF.drop_duplicates(keep='first', inplace=True, subset='A')
extDF.sort_values(by='A', inplace=True)
extDF.reset_index(drop=True, inplace=True)
print(extDF)