Thank you for your help. I am still relatively new to pandas and do not observe this specific kind of query in search results.
I have a pandas dataframe:
+-----+---------+----------+
| id | value | match_id |
+-----+---------+----------+
| A10 | grass | 1 |
| B45 | cow | 3 |
| B98 | bird | 6 |
| B17 | grass | 1 |
| A20 | tree | 2 |
| A87 | farmer | 5 |
| B11 | grass | 1 |
| A33 | chicken | 4 |
| B56 | tree | 2 |
| A23 | farmer | 5 |
| B65 | cow | 3 |
+-----+---------+----------+
I need to filter this dataframe for rows that contain matching match_id values, with the condition that the id column must also contain both strings A and B.
This is the expected output:
+-----+-------+----------+
| id | value | match_id |
+-----+-------+----------+
| A10 | grass | 1 |
| B17 | grass | 1 |
| A20 | tree | 2 |
| B11 | grass | 1 |
| B56 | tree | 2 |
+-----+-------+----------+
How can I do this in, say, a single line of pandas code? Reproducible program below:
import pandas as pd
data_example = {'id': ['A10', 'B45', 'B98', 'B17', 'A20', 'A87', 'B11', 'A33', 'B56', 'A23', 'B65'],
'value': ['grass', 'cow', 'bird', 'grass', 'tree', 'farmer', 'grass', 'chicken', 'tree', 'farmer', 'cow'],
'match_id': [1, 3, 6, 1, 2, 5, 1, 4, 2, 5, 3]}
df_example = pd.DataFrame(data=data_example)
data_expected = {'id': ['A10', 'B17', 'A20', 'B11', 'B56'],
'value': ['grass', 'grass', 'tree', 'grass', 'tree'],
'match_id': [1, 1, 2, 1, 2]}
df_expected = pd.DataFrame(data=data_expected)
Thank you!
B56,tree,2get included in the final output? While the ID contains B, it doesn't also contain 2match_idcolumn that have matching integers, and 2.) by rows inidcolumn that contain string values bothAandBper rows of matchingmatch_idrows. Is this helpful?match_id, at least 1idand starts with "A" and at least one otheridthat starts with "B" needs to be present?match_id == 3only has values in theidcolumn that start with "B", so that group is excluded?