How to Filter a Pandas Dataframe Column of Lists

Question

Goal: To filter rows based on the values of column of lists.

Given:

index	pos_order
3192304	`['VB', 'DT', 'NN', 'NN', 'NN', 'NN']`
1579035	`['VB', 'PRP', 'VBP', 'NN', 'RB', 'IN', 'NNS', 'NN']`
763020	`['VB', 'VBP', 'PRP', 'JJ', 'IN', 'NN']`
1289986	`['VB', 'NN', 'IN', 'CD', 'CD']`
69194	`['VB', 'DT', 'JJ', 'NN']`
3068116	`['VB', 'JJ', 'IN', 'NN', 'NN']`
1506722	`['VB', 'NN', 'NNS', 'NNP']`
3438101	`['VB', 'VB', 'IN', 'DT', 'NNS', 'NNS', 'CC', 'NN', 'NN']`
1376463	`['VB', 'DT', 'NN', 'NN']`
1903231	`['VB', 'DT', 'PRP', 'VBP', 'JJ', 'IN', 'NNP', 'NNP']`

I'd like to find a way to query this table to fetch rows where a given pattern is present. For example, if the pattern is ['IN', 'NN'], I should get rows 763020 and 3068116, but not row 3438101. So to be clear, the order of the list elements also matters.

I tried going about it, this way:

def target_phrase(pattern_tested, pattern_to_match):
    if ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested)):
        print (pattern_tested)
        return True
    else:
        return False

I can run this code using lists outside of pandas, but when I try using something like:

target_phrase(df.loc[5]['pos_order'], ['IN', 'NN'])

the code fails.

Any clue?

What is the actual data type of the data in the column? More precisely, what gives type(df.iloc[0]['pos_order']) — Serge Ballesta
– Serge Ballesta, Commented Feb 3, 2021 at 18:31

Kate Melnykova · Accepted Answer · 2021-02-03 18:36:00Z

2

First, let me provide a simplified view of target_phrase:

def target_phrase(pattern_tested, pattern_to_match):
    return ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested))

Why the code does not work? Because target_phrase expects the first argument to be a list, not a pandas dataframe. The correct syntaxis is as follows:

df['pattern_matched'] = df.apply(lambda x: target_phrase(x['pos_order'], 
                                                         ['IN', 'NN']), axis=1)

This function applies target_phrase row-wise.

answered Feb 3, 2021 at 18:36

Kate Melnykova

1,8731 gold badge7 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Xavi Pi Over a year ago

[drive.google.com/file/d/1peWGZl01SAdAKHWaO9zV9F3EQWtZiwlQ/… Here is a sample of the document. When I run this document pass the code you suggest, I get "False" for all rows. So it's working yet, but I do think you're right about the issue being how the information is passed through. Thanks for trying.

Xavi Pi · Accepted Answer · 2021-02-03 22:11:51Z

0

As it turned out it was a combination of things, things that Kate and Serge together led me to figure out.

As I had everything, the data types being compared were not similar. I was comparing a string to a list. I had to add code to convert that string that looked like a list to a list--Serge's contribution. Once that was done, I was able to successfully run lambda thanks to Kate.

answered Feb 3, 2021 at 22:11

Xavi Pi

919 bronze badges

Collectives™ on Stack Overflow

How to Filter a Pandas Dataframe Column of Lists

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related