9

I have a dataframe spanning several years and at some point they changed the codes for ethnicity. So I need to recode the values conditional on the year - which is another column in the same dataframe. For instance 1 to 3, 2 to 3, 3 to 4 and so on:

old = [1, 2, 3, 4, 5, 91]
new = [3, 3, 4, 2, 1, 6]

And this is only done for the years 1996 to 2001. The values for the other years in the same column (ethnicity) must not be changed. Hoping to avoid too many inefficient loops, I tried:

    recode_years = range(1996,2002)
    for year in recode_years:
        df['ethnicity'][df.year==year].replace(old, new, inplace=True)

But the original values in the dataframe did not change. The replace method itself replaced and returned the new values correctly, but the inplace option seems not to affect the original dataframe when applying a conditional. This may be obvious to experienced Pandas users, but surely there must be some simple way of doing this instead of looping over every singel element?

Edit (x2): Her is an an example of another approach which also did not work ('Length of replacements must equal series length' and "TypeError: array cannot be safely cast to required type"):

oldNewMap = {1:2, 2:3}
df2 = DataFrame({"year":[2000,2000,2000,2001,2001,2001],"ethnicity":[1,2,1,2,3,1]})
df2['ethnicity'][df2.year==2000] = df2['ethnicity'][df2.year==2000].map(oldNewMap)

Edit: It seems to be a problems specific to the installation/version since this works fine on my other computer.

1 Answer 1

10

It may just be simpler to do it a different way:

oldNewMap = {1: 3, 2: 3, 3: 4, 4: 2, 5: 1, 91: 6}
df['ethnicity'][df.year==year] = df['ethnicity'][df.year==year].map(oldNewMap)
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you! I tried this and something similar, but, curiously, it does not work because Python says that "the array cannot be safely cast to required type" because they have "unequal length." However, they do not! The series on the right and left hand side are of equal length. Maybe Panda uses the length of the whole dataframe and not the series created when slicing using np style syntax?
@user2040900: It works for me. What version of Pandas are you using? Can you edit your question to show an example of what happens when you try this?
@user2040900: Hmmm, strange. It works in 0.11dev. Can you try accessing the elements with df.ix[df.year==year, 'ethnicity'] instead?
I tried it on a second computer, Python 2.7, Pandas 0.9.1. Everything worked fine. Same code generated the mentioned errors on the other computer (same Python version, updated Pandas). May be a problem specific to the computer/installation. Thanks for helping me sort this out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.