0

I have a dataframe that contains Physician_Profile_City, Physician_Profile_State and Physician_Profile_Zip_Code. I ultimately want to stratify an analysis based on state, but unfortunately not all of the Physician_Profile_States are filled in. I started looking around to try and figure out how to fill in the missing States. I came across the pyzipcode module which can take as an input a zip code and returns the state as follows:

In [39]: from pyzipcode import ZipCodeDatabase
zcdb = ZipCodeDatabase()
zcdb = ZipCodeDatabase()
zipcode = zcdb[54115]
zipcode.state

Out[39]: u'WI'

What I'm struggling with is how I would iterate through the dataframe and add the appropriate "Physician_Profile_State" when that variable is missing. Any suggestions would be most appreciated.

Dataframe

1 Answer 1

1

No need to iterate if the form of the data is a dict then you should be able to perform the following:

df['Physician_Profile_State'] = df['Physician_Profile_Zip_Code'].map(zcdb)

Otherwise you can call apply like so:

df['Physician_Profile_State'] = df['Physician_Profile_Zip_Code'].apply(lambda x: zcdb[x].state)

In the case where the above won't work as it can't generate a Series to align with you df you can apply row-wise passing axis=1 to the df:

df['Physician_Profile_State'] = df[['Physician_Profile_Zip_Code']].apply(lambda x: zcdb[x].state, axis=1)

By using double square brackets we return a df allowing you to pass the axis param

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the response. Unfortunately, none of the options you provided worked. I think the reason why is that the only way to convert a zip code into a string is by using the following code zipcode = zcdb[54115] zipcode.state, but in the answers you provided, there is never any accessing of the ZipCodeDatabase. Do you know how that might be accomplished?
OK, I see my error could you try example 2 or 3, I can't install pyzipcode as I'm using python 3, one method would be to generate a dict of all the states and their respective zip code. You can obtain a list of the unique values by calling list(df['Physician_Profile_Zip_Code'].unique())

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.