I am new to Python (used to coding with cousin R) and am still getting a hang of pandas. There is an incredibly helpful, related post., but instead of filter()ing by a set number, I was hoping to do so by a criteria defined in a second data set.
Let's make some toy data:
import pandas as pd
pets = [['foxhound', 'dog', 20], ['husky', 'dog', 25], ['GSD', 'dog', 24],['Labrador', 'dog', 23],['Persian', 'cat', 7],['Siamese', 'cat', 6],['Tabby', 'cat', 5]]
df = pd.DataFrame(pets , columns = ['breed', 'species','height']).set_index('breed')
TooBigForManhattan = [['dog', 22],['cat', 6]]
TooBig = pd.DataFrame(TooBigForManhattan, columns = ['species','height']).set_index('species')
I am trying to subset df() by selecting the breeds that are less than or equal to the TooBig() values. My pseudo-code looks like:
df.groupby(['breed','species']).filter(lambda x : (x['height']<='TooBig Cutoff by Species').any())
The data I am working with are thousands of entries with about a hundred criteria, so any help in defining a solution that could work at that scale would be very helpful.
Thanks in advance!