2

I have a dataframe that contains numbers represented as strings which uses the comma separator (e.g. 150,000). There are also some values that are represented by "-".

I'm trying to convert all the numbers that are represented as strings into a float number. The "-" will remain as it is.

My current code uses a for loop to iterate each column and row to see if each cell has a comma. If so, it removes the comma then converts it to a number.

This works fine most of the time except some of the dataframes have duplicated column names and that's when it falls apart.

Is there a more efficient way of doing this update (i.e. not using loops) and also avoid the problem when there are duplicated column names?

Current code:

    for col in statement_df.columns: 
    row = 0
    while row < len(statement_df.index):

        row_name = statement_df.index[row]

        if statement_df[col][row] == "-":
            #do nothing
            print(statement_df[col][row])

        elif statement_df[col][row].find(",") >= 0:
            #statement_df.loc[col][row] = float(statement_df[col][row].replace(",",""))
            x = float(statement_df[col][row].replace(",",""))
            statement_df.at[row_name, col] = x
            print(statement_df[col][row])

        else:

            x = float(statement_df[col][row])
            statement_df.at[row_name, col] = x
            print(statement_df[col][row])

        row = row + 1

2 Answers 2

1

Use str.replace(',', '') on dataframe itself

For a dataframe like below

Name  Count
Josh  12,33
Eric  24,57
Dany  9,678

apply like these

df['Count'] = df['Count'].str.replace(',', '')
df

It will give you the following output

   Name Count
0  Josh  1233
1  Eric  2457
2  Dany  9678
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that works - at least it removes all occurences of ",". However, once the "," has been removed, the "number" is still technically a string. How can I convert to float numbers but bear in mind that I still have occurences of "-" which I want to leave unchanged?
0

You can use iloc functionality for that -

for idx in range(len(df.columns)):
    df.iloc[:, idx] = df.iloc[:, idx].apply(your_function)

The code in your_function should be able to deal with input from one row. For example -

def your_function(x):
    if x == ',': return 0
    return float(x)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.