0

Goal: To filter rows based on the values of column of lists.

Given:

index pos_order
3192304 ['VB', 'DT', 'NN', 'NN', 'NN', 'NN']
1579035 ['VB', 'PRP', 'VBP', 'NN', 'RB', 'IN', 'NNS', 'NN']
763020 ['VB', 'VBP', 'PRP', 'JJ', 'IN', 'NN']
1289986 ['VB', 'NN', 'IN', 'CD', 'CD']
69194 ['VB', 'DT', 'JJ', 'NN']
3068116 ['VB', 'JJ', 'IN', 'NN', 'NN']
1506722 ['VB', 'NN', 'NNS', 'NNP']
3438101 ['VB', 'VB', 'IN', 'DT', 'NNS', 'NNS', 'CC', 'NN', 'NN']
1376463 ['VB', 'DT', 'NN', 'NN']
1903231 ['VB', 'DT', 'PRP', 'VBP', 'JJ', 'IN', 'NNP', 'NNP']

I'd like to find a way to query this table to fetch rows where a given pattern is present. For example, if the pattern is ['IN', 'NN'], I should get rows 763020 and 3068116, but not row 3438101. So to be clear, the order of the list elements also matters.

I tried going about it, this way:

def target_phrase(pattern_tested, pattern_to_match):
    if ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested)):
        print (pattern_tested)
        return True
    else:
        return False

I can run this code using lists outside of pandas, but when I try using something like:

target_phrase(df.loc[5]['pos_order'], ['IN', 'NN'])

the code fails.

Any clue?

2
  • 1
    What is the actual data type of the data in the column? More precisely, what gives type(df.iloc[0]['pos_order']) Commented Feb 3, 2021 at 18:31
  • in: type(df.iloc[0]['pos_order']) out: str Commented Feb 3, 2021 at 21:17

2 Answers 2

2

First, let me provide a simplified view of target_phrase:

def target_phrase(pattern_tested, pattern_to_match):
    return ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested))

Why the code does not work? Because target_phrase expects the first argument to be a list, not a pandas dataframe. The correct syntaxis is as follows:

df['pattern_matched'] = df.apply(lambda x: target_phrase(x['pos_order'], 
                                                         ['IN', 'NN']), axis=1)

This function applies target_phrase row-wise.

Sign up to request clarification or add additional context in comments.

1 Comment

[drive.google.com/file/d/1peWGZl01SAdAKHWaO9zV9F3EQWtZiwlQ/… Here is a sample of the document. When I run this document pass the code you suggest, I get "False" for all rows. So it's working yet, but I do think you're right about the issue being how the information is passed through. Thanks for trying.
0

As it turned out it was a combination of things, things that Kate and Serge together led me to figure out.

As I had everything, the data types being compared were not similar. I was comparing a string to a list. I had to add code to convert that string that looked like a list to a list--Serge's contribution. Once that was done, I was able to successfully run lambda thanks to Kate.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.