0

I have a certain data to clean, it's some keys where the keys have six leading zeros that I want to get rid of, and if the keys are not ending with "ABC" or it's not ending with "DEFG", then I need to clean the currency code in the last 3 indexes. If the key doesn't start with leading zeros, then just return the key as it is.

To achieve this I wrote a function that deals with string as below:

def cleanAttainKey(dirtyAttainKey):

    if dirtyAttainKey[0] != "0":
        return dirtyAttainKey
    else:
        dirtyAttainKey = dirtyAttainKey.strip("0")

    if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
        dirtyAttainKey =  dirtyAttainKey[:-3]
    cleanAttainKey = dirtyAttainKey
    return cleanAttainKey

Now I build a dummy data frame to test it but it's reporting errors:

  1. data frame
df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
                  columns=["dirtyKey","amount"])
  1. I need to get a new column called "cleanAttainKey" in the df, then modify each value in the "dirtyKey" using the "cleanAttainKey" function, then assign the cleaned key to the new column "cleanAttainKey", however it seems pandas doesn't support this type of modification.
# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
    df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

I am getting the below error message:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

The result should be the same as the df2 below:

df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
                  'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
                  columns=["dirtyKey","cleanAttainKey","amount"])
df2

Is there any better way to modify the dirty keys and get a new column with the clean keys in Pandas? Thanks

1 Answer 1

1

Here is the culprit:

df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

When you use extract of the dataframe, Pandas reserves the ability to choose to make a copy or a view. It does not matter if you are just reading the data, but it means that you should never modify it.

The idiomatic way is to use loc (or iloc or [i]at):

df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])

(above assumes a natural range index...)

Sign up to request clarification or add additional context in comments.

5 Comments

Hi Serge, if I am typing the data frame as above, then your solution works. However, in my case, I am using pd.read_excel() to load the excel file, and in this case, even using df.loc, I still get an error message of : TypeError: 'int' object is not subscriptable
@commentallez-vous: It is a different problem then. You should examine the line raising the error. It should contain a variable that you expect to be a list (or a series or a dict or a dataframe) and is just an int. I cannot guess more. If you cannot solve it, you should considere asking a new question with the full stacktrace and enough data to reproduce.
Sure thanks a lot, it seems it is the issue in the function that I wrote, let me check on that first.
Got it, just add dirtyAttainKey = str(dirtyAttainKey) solves it thanks again
@commentallez-vous: Please do not forget to accept the answer to indicate future readers that you no longer need help here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.