I have a dataframe that contains numbers represented as strings which uses the comma separator (e.g. 150,000). There are also some values that are represented by "-".
I'm trying to convert all the numbers that are represented as strings into a float number. The "-" will remain as it is.
My current code uses a for loop to iterate each column and row to see if each cell has a comma. If so, it removes the comma then converts it to a number.
This works fine most of the time except some of the dataframes have duplicated column names and that's when it falls apart.
Is there a more efficient way of doing this update (i.e. not using loops) and also avoid the problem when there are duplicated column names?
Current code:
for col in statement_df.columns:
row = 0
while row < len(statement_df.index):
row_name = statement_df.index[row]
if statement_df[col][row] == "-":
#do nothing
print(statement_df[col][row])
elif statement_df[col][row].find(",") >= 0:
#statement_df.loc[col][row] = float(statement_df[col][row].replace(",",""))
x = float(statement_df[col][row].replace(",",""))
statement_df.at[row_name, col] = x
print(statement_df[col][row])
else:
x = float(statement_df[col][row])
statement_df.at[row_name, col] = x
print(statement_df[col][row])
row = row + 1