Goal: To filter rows based on the values of column of lists.
Given:
| index | pos_order |
|---|---|
| 3192304 | ['VB', 'DT', 'NN', 'NN', 'NN', 'NN'] |
| 1579035 | ['VB', 'PRP', 'VBP', 'NN', 'RB', 'IN', 'NNS', 'NN'] |
| 763020 | ['VB', 'VBP', 'PRP', 'JJ', 'IN', 'NN'] |
| 1289986 | ['VB', 'NN', 'IN', 'CD', 'CD'] |
| 69194 | ['VB', 'DT', 'JJ', 'NN'] |
| 3068116 | ['VB', 'JJ', 'IN', 'NN', 'NN'] |
| 1506722 | ['VB', 'NN', 'NNS', 'NNP'] |
| 3438101 | ['VB', 'VB', 'IN', 'DT', 'NNS', 'NNS', 'CC', 'NN', 'NN'] |
| 1376463 | ['VB', 'DT', 'NN', 'NN'] |
| 1903231 | ['VB', 'DT', 'PRP', 'VBP', 'JJ', 'IN', 'NNP', 'NNP'] |
I'd like to find a way to query this table to fetch rows where a given pattern is present. For example, if the pattern is ['IN', 'NN'], I should get rows 763020 and 3068116, but not row 3438101. So to be clear, the order of the list elements also matters.
I tried going about it, this way:
def target_phrase(pattern_tested, pattern_to_match):
if ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested)):
print (pattern_tested)
return True
else:
return False
I can run this code using lists outside of pandas, but when I try using something like:
target_phrase(df.loc[5]['pos_order'], ['IN', 'NN'])
the code fails.
Any clue?
type(df.iloc[0]['pos_order'])