I have a German csv file that was incorrectly encoded. I want to convert the characters back to utf-8 using a dictionary. I thought what I was doing was correct, but when I print the DF, nothing has changed. Here's my code:
DATA_DIR = 'C:\\...'
translations = {
'ö': 'oe',
'ü': 'ue',
'ß': 'ss',
'ä': 'ae',
'€': '€',
'Ä': 'Ae',
'Ö': 'Oe',
'Ü': 'Ue'
}
def cleanup():
for file in os.listdir(os.path.join(DATA_DIR)):
if not file.lower().endswith('.csv'):
continue
data_utf = pd.read_csv(os.path.join(DATA_DIR, file), header=3, index_col=None, skiprows=0-2)
data_utf.replace(translations, inplace=True)
print(data_utf)
if __name__ == '__main__':
cleanup()
I also tried
for before, after in translations.items():
data_utf.replace(before, after)
within the function, and directly putting the translations in the replace itself. This process works if I specify the column in which to replace the characters, however. What do I need to do to apply these translations to the whole dataframe, as well as to the dataframe column headers? Thanks!