Python Pandas dataframe modify column value based on function that cleans string value and assign to new column

Question

I have a certain data to clean, it's some keys where the keys have six leading zeros that I want to get rid of, and if the keys are not ending with "ABC" or it's not ending with "DEFG", then I need to clean the currency code in the last 3 indexes. If the key doesn't start with leading zeros, then just return the key as it is.

To achieve this I wrote a function that deals with string as below:

def cleanAttainKey(dirtyAttainKey):

    if dirtyAttainKey[0] != "0":
        return dirtyAttainKey
    else:
        dirtyAttainKey = dirtyAttainKey.strip("0")

    if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
        dirtyAttainKey =  dirtyAttainKey[:-3]
    cleanAttainKey = dirtyAttainKey
    return cleanAttainKey

Now I build a dummy data frame to test it but it's reporting errors:

data frame

df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
                  columns=["dirtyKey","amount"])

I need to get a new column called "cleanAttainKey" in the df, then modify each value in the "dirtyKey" using the "cleanAttainKey" function, then assign the cleaned key to the new column "cleanAttainKey", however it seems pandas doesn't support this type of modification.

# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
    df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

I am getting the below error message:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

The result should be the same as the df2 below:

df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
                  'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
                  columns=["dirtyKey","cleanAttainKey","amount"])
df2

Is there any better way to modify the dirty keys and get a new column with the clean keys in Pandas? Thanks

Serge Ballesta · Accepted Answer · 2020-03-11 08:22:04Z

1

Here is the culprit:

df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

When you use extract of the dataframe, Pandas reserves the ability to choose to make a copy or a view. It does not matter if you are just reading the data, but it means that you should never modify it.

The idiomatic way is to use loc (or iloc or [i]at):

df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])

(above assumes a natural range index...)

answered Mar 11, 2020 at 8:22

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

RiffRaffCat Over a year ago

Hi Serge, if I am typing the data frame as above, then your solution works. However, in my case, I am using pd.read_excel() to load the excel file, and in this case, even using df.loc, I still get an error message of : TypeError: 'int' object is not subscriptable

Serge Ballesta Over a year ago

@commentallez-vous: It is a different problem then. You should examine the line raising the error. It should contain a variable that you expect to be a list (or a series or a dict or a dataframe) and is just an int. I cannot guess more. If you cannot solve it, you should considere asking a new question with the full stacktrace and enough data to reproduce.

RiffRaffCat Over a year ago

Sure thanks a lot, it seems it is the issue in the function that I wrote, let me check on that first.

RiffRaffCat Over a year ago

Got it, just add dirtyAttainKey = str(dirtyAttainKey) solves it thanks again

Serge Ballesta Over a year ago

@commentallez-vous: Please do not forget to accept the answer to indicate future readers that you no longer need help here.

Collectives™ on Stack Overflow

Python Pandas dataframe modify column value based on function that cleans string value and assign to new column

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related