1

I have a dataframe that looks like this:

    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
1   2    [5]              [D]
2   3    [4, 12]          [A, D]
3   4    [2, 6, 13, 12]   [X, Z, T, D]

I would like the extract the rows where the list in the AgeGroups column has at least 2 values less than 7 and at least 1 value greater than 8.

So the result should look like this:

    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
3   4    [2, 6, 13, 12]   [X, Z, T, D]

I'm not sure how to do it.

2 Answers 2

3

First create helper DataFrame and compare by DataFrame.lt and DataFrame.gt, then Series by Series.ge and chain masks by & for bitwise AND:

import ast
#if not lists
#df['AgeGroups'] = df['AgeGroups'].apply(ast.literal_eval)

df1 = pd.DataFrame(df['AgeGroups'].tolist())
df = df[df1.lt(7).sum(axis=1).ge(2) & df1.gt(8).sum(axis=1).ge(1)]
print (df)
   ID       AgeGroups      PaperIDs
0   1      [3, 3, 10]     [A, B, C]
3   4  [2, 6, 13, 12]  [X, Z, T, D]

Or use list comprehension with compare numpy arrays, counts by sum and compare both counts chained by and, because scalars:

m = [(np.array(x) < 7).sum() >= 2 and (np.array(x) > 8).sum() >=1  for x in df['AgeGroups']]

df = df[m]
print (df)
   ID       AgeGroups      PaperIDs
0   1      [3, 3, 10]     [A, B, C]
3   4  [2, 6, 13, 12]  [X, Z, T, D]
Sign up to request clarification or add additional context in comments.

Comments

2

Simple if else logic I wrote for each row using apply function, you can also use list comprehension for row.

data = {'ID':['1', '2', '3', '4'], 'AgeGroups':[[3,3,10],[2],[4,12],[2,6,13,12]],'PaperIDs':[['A','B','C'],['D'],['A','D'],['X','Z','T','D']]} 
df = pd.DataFrame(data)
def extract_age(row):
    my_list = row['AgeGroups']
    count1 = 0
    count2 = 0
    if len(my_list)>=3:
        for i in my_list:
            if i<7:
                count1 = count1 +1
            elif i>8:
                count2 = count2+1
    if (count1 >= 2) and (count2 >=1):
        print(row['AgeGroups'],row['PaperIDs'])


df.apply(lambda x: extract_age(x), axis =1)

Output

[3, 3, 10] ['A', 'B', 'C']
[2, 6, 13, 12] ['X', 'Z', 'T', 'D']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.