Key errors for object after converting it to a string in pandas?

Question

I have three csv files we can call a, b, and c. File a has geographic information including zip codes. File b has statistical data. File c has only zip codes.

I used pandas to convert a and b to dataframes (a_df and b_df) which I used to join on information that was a shared column between those two dataframes (intermediate_df). File c was read and converted to a dataframe that had the zipcode as an integer type. I had to convert that to string so zipcodes are not treated as integers. However, c_df treats that column as objects after I convert it to string which means then I cannot do a join between c_df and intermediate_df to make final_df.

To illustrate what I mean:

a_data = pd.read_csv("a.csv")
b_data = pd.read_csv("b.csv", dtype={'zipcode': 'str'})
a_df = pd.DataFrame(a_data)
b_df = pd.DataFrame(b_data)

# file c conversion
c_data = pd.read_csv("slcsp.csv", dtype={'zipcode': 'str'})
print ("This is c data types: ", c_data.dtypes)
c_conversion = c_data['zipcode'].apply(str)
print ("This is c_conversion data types: ", c_conversion.dtypes)
c_df = pd.DataFrame(c_conversion)
print ("This is c_df data types: ", c_df.dtypes)

# Joining on the two common columns to avoid duplicates
joined_ab_df = pd.merge(a_df, a_df, on =['state', 'area'])

# Dropping columns that are not needed anymore
ab_for_analysis_df = joined_ab.drop(['county_code','name', 'area'], axis=1)

# Time to analyze this dataframe. Let's pick out only the silver values for 
a specific attribute
silver_only_df = (ab_for_analysis_df[filtered_df.metal_name == 'Silver'])

# Getting second lowest value of silver only
sorted_silver = silver_only_df.groupby('zipcode')['rate'].nsmallest(2)
sorted_silver_df = sorted_silver.to_frame()

print ("We cleaned up our data. Let's join the dataframes.")
print ("Final result...")
print (c_df.dtypes)
final_df = pd.merge(sorted_silver_df,c_df, on ='zipcode')

This is what we get after running it:

This is c_data types:  zipcode     object
rate       float64
dtype: object
This is c_conversion_data types:  object
This is c_df data types:  zipcode    object
dtype: object
zipcode    object
dtype: object

We cleaned up our data. Let's join the dataframes.
This is the final result...
KeyError: 'zipcode'

Any idea why it changed data types and how do I then fix it so it all joins in the end?

Can you add print(c_df.columns) and print(sorted_silver_df.columns) — Bharath M Shetty
– Bharath M Shetty, Commented Oct 20, 2017 at 6:14
So the second to last line: print (c_df.dtypes) doesn't print either? That's bizarre. I recommend using ipython/jupyter and the %debug magic function, that way you can step through these kind of errors. — Andy Hayden
– Andy Hayden, Commented Oct 20, 2017 at 6:19
It's a weird problem. @AndyHayden. The print c_df.dtypes works though it gives weird results — Christina Smithers
– Christina Smithers, Commented Oct 20, 2017 at 6:30

jezrael · Accepted Answer · 2017-10-20 06:24:48Z

2

If convert to str always output dtype is object.

For check strings need check type:

print (c_data['zipcode'].apply(type))

To your last error:

Need reset_index, because else zipcode is index, not column:

sorted_silver_df = silver_only_df.groupby('zipcode')['rate'].nsmallest(2).reset_index()
final_df = pd.merge(sorted_silver_df,c_df, on ='zipcode')

Thanks, Andy for alternative (untested):

sorted_silver_df = silver_only_df.groupby('zipcode', as_index=False)['rate'].nsmallest(2)
final_df = pd.merge(sorted_silver_df,c_df, on ='zipcode')

Or use left_index=True and riht_on in merge:

sorted_silver = silver_only_df.groupby('zipcode')['rate'].nsmallest(2)
sorted_silver_df = sorted_silver.to_frame()
final_df = pd.merge(sorted_silver_df,c_df, right_on ='zipcode', left_index=True)

edited Oct 20, 2017 at 6:24

answered Oct 20, 2017 at 6:13

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Andy Hayden Over a year ago

can also use as_index=False in the groupby (instead of reset_index)

jezrael Over a year ago

Thanks, I add it to answer. But sometimes it does not work, so rather I add notice untested ;)

Christina Smithers Over a year ago

Thanks. I can't seem to get that to work either: 50 <class 'str'> Name: zipcode, dtype: object This is c_data types: object This is c_df data types: zipcode object dtype: object zipcode object dtype: object We cleaned up our data. Let's join the dataframes. Final result...raise ValueError('len(right_on) must equal the number ' ValueError: len(right_on) must equal the number of levels in the index ` of "left"`

jezrael Over a year ago

So it does not work? Then problem is in data - maybe some whitespaces I guess. Is possible share your data - gdocs, dropbox or send by email if not confidental data?

jezrael Over a year ago

I think because pd.merge(sorted_silver_df,c_df, on ='zipcode') failed...

|

Collectives™ on Stack Overflow

Key errors for object after converting it to a string in pandas?

1 Answer 1

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related