0

Hello

i need to create a query that finds the counties that belong to regions 1 or 2, whose name starts with 'Washington', and whose POPESTIMATE2015 was greater than their POPESTIMATE 2014 , using pandas This function should return a 5x2 DataFrame with the columns = ['STNAME', 'CTYNAME'] and the same index ID as the census_df (sorted ascending by index)

you'll find a description of my data in the picture : enter image description here

2 Answers 2

2

Consider the following demo:

In [19]: df
Out[19]:
   REGION      STNAME            CTYNAME  POPESTIMATE2014  POPESTIMATE2015
0       0  Washington         Washington               10               12
1       1  Washington  Washington County               11               13
2       2     Alabama     Alabama County               13               15
3       4      Alaska             Alaska               14               12
4       3     Montana            Montana               10               11
5       2  Washington         Washington               15               19

In [20]: qry = "REGION in [1,2] and POPESTIMATE2015 > POPESTIMATE2014 and CTYNAME.str.contains('^Washington')"

In [21]: df.query(qry, engine='python')[['STNAME', 'CTYNAME']]
Out[21]:
       STNAME            CTYNAME
1  Washington  Washington County
5  Washington         Washington
Sign up to request clarification or add additional context in comments.

Comments

2

Use boolean indexing with mask created by isin and startswith:

mask = df['REGION'].isin([1,2]) & 
       df['COUNTY'].str.startswith('Washington') & 
       (df['POPESTIMATE2015'] > df['POPESTIMATE2014'])

df = df.loc[mask, ['STNAME', 'CTYNAME']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.