Query or filter pandas dataframe on multiple columns and cell values

Question

I've tried to figure this one out, but all methods end up with errors.

Quick background info: I have a dataframe with US states and info when a state has adopted a new law. In addition, the dataframe tells which are the neighbouring states. Here's a quick mockup:

(df: treated_states)

| year     | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980     | AZ    |1999   |100   |CA        |AK        |WV        |1      |
| 1999     | AZ    |1999   |50    |CA        |AK        |WV        |1      |

(df: not_treated)

| year     | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980     | WV    |0      |100   |CA        |AK        |WV        |0      |
| 1999     | AK    |0      |80    |CA        |AK        |WV        |0      |

In reality the dataframe includes all US states and all their neighbouring states in separate columns. There are data points before and and after the change in legislation.

I have filtered states that have adopted the new BC_law (df: treated_states) and those that haven't (df: not_treated). Now I should find out if the adoption of the new law affects the neighbouring states.

My problem: I should filter and store into a new dataframe the treated_states dataframes' Neighbor1, Neighbor2, Neighbor3 that exist in the dataframe not_treated

In case you wonder why I need to filter those: I'll do a difference-in-difference regression to determine if adopting the new law affects the neighboring states that don't adopt the BC_law.

Shubham Sharma · Accepted Answer · 2022-02-06 13:35:19Z

1

Here is a more concise approach:

Filter the Neighbour like columns
Create boolean mask with DataFrame.isin to check whether each element in dataframe is contained in state column of non_treated
Reduce the boolean mask along the columns axis with any

cols = treated_states.filter(like='Neigh')
mask = cols.isin(not_treated.state).any(axis=1)

>>> treated_states[mask]

   year state  BC_law  n_ipo Neighbor1 Neighbor2 Neighbor3  Treated
0  1980    AZ    1999    100        CA        AK        WV        1
1  1999    AZ    1999     50        CA        AK        WV        1

answered Feb 6, 2022 at 13:35

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alex Günsberg · Accepted Answer · 2022-02-06 13:19:55Z

I managed to write this and it seems to work... Maybe it helps someone and a better coder can suggest a more efficient and elegant way to solve this

#creating a list of states that have not adopted the new legislation
not_treated_list = not_treated.STATE

#df with states that have not adopted the new legislation, that are also neighbouring states that have adopted the new legislation 
neighbour = treated_states[(treated_states.Neigh1.isin(not_treated_list)) | (treated_states.Neigh2.isin(not_treated_list)) |(treated_states.Neigh3.isin(not_treated_list)) | (treated_states.Neigh4.isin(not_treated_list)) | (treated_states.Neigh5.isin(not_treated_list)) | (treated_states.Neigh6.isin(not_treated_list)) | (treated_states.Neigh7.isin(not_treated_list)) | (treated_states.Neigh8.isin(not_treated_list)) ]

neighbour

Collectives™ on Stack Overflow

Query or filter pandas dataframe on multiple columns and cell values

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related