I've tried to figure this one out, but all methods end up with errors.
Quick background info: I have a dataframe with US states and info when a state has adopted a new law. In addition, the dataframe tells which are the neighbouring states. Here's a quick mockup:
(df: treated_states)
| year | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980 | AZ |1999 |100 |CA |AK |WV |1 |
| 1999 | AZ |1999 |50 |CA |AK |WV |1 |
(df: not_treated)
| year | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980 | WV |0 |100 |CA |AK |WV |0 |
| 1999 | AK |0 |80 |CA |AK |WV |0 |
In reality the dataframe includes all US states and all their neighbouring states in separate columns. There are data points before and and after the change in legislation.
I have filtered states that have adopted the new BC_law (df: treated_states) and those that haven't (df: not_treated). Now I should find out if the adoption of the new law affects the neighbouring states.
My problem: I should filter and store into a new dataframe the treated_states dataframes' Neighbor1, Neighbor2, Neighbor3 that exist in the dataframe not_treated
In case you wonder why I need to filter those: I'll do a difference-in-difference regression to determine if adopting the new law affects the neighboring states that don't adopt the BC_law.