By reading csv file with python pandas, and try to change encoding, because of some Germans letters, seams Azure always keep same encoding (assuming default).
Whatever I've done, always get same error on Azure portal: 'utf-8' codec can't decode byte 0xc4 in position 0: invalid continuation byte Stack
Same error appears even if I set, uft-16, latin1, cp1252 etc.
with pysftp.Connection(host, username=username, password=password, cnopts=cnopts) as sftp:
for i in sftp.listdir_attr():
with sftp.open(i.filename) as f:
df = pd.read_csv(f, delimiter=';', encoding='cp1252')
By the way, testing this locally on windows machine, it works fine.
Full error:
Result: Failure Exception: UnicodeDecodeError: 'utf-8' codec cant decode byte 0xc4 in position 0: invalid continuation byte Stack: File "/home/site/wwwroot/.python_packages/lib/site-packages/azure_functions_worker/dispatcher.py",
line 355, in _handle__invocation_request call_result = await self._loop.run_in_executor(
File "/usr/local/lib/python3.8/concurrent/futures/thread.py",
line 57, in run result = self.fn(*self.args, **self.kwargs) File "/home/site/wwwroot/.python_packages/lib/site-packages/azure_functions_worker/dispatcher.py",
line 542, in __run_sync_func return func(**params)
File "/home/site/wwwroot/ce_etl/etl_main.py",
line 141, in main df = pd.read_csv(f, delimiter=';', encoding=r"utf-8-sig", error_bad_lines=False)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/util/_decorators.py",
line 311, in wrapper return func(*args, **kwargs)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/io/parsers/readers.py",
line 586, in read_csv return _read(filepath_or_buffer, kwds)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/io/parsers/readers.py",
line 488, in _read return parser.read(nrows)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/io/parsers/readers.py",
line 1047, in read index, columns, col_dict = self._engine.read(nrows)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/io/parsers/c_parser_wrapper.py",
line 223, in read chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx",
line 801, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx",
line 880, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx",
line 1026, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx",
line 1080, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx",
line 1204, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx",
line 1217, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx",
line 1396, in pandas._libs.parsers._string_box_utf8
0xc4- maube it is different encoding then you expect.b'\xc4'.decode('cp1252')orb'\xc4'.decode('latin1')then I getÄ'. Maybe you have problem in different place - better show FULL error message in question (not in comments). And show original code which generates this error. maybe you seteencodingin wrong line or in wrong file and Azure runs all time wrong code.main df = pd.read_csv(..., encoding=r"utf-8-sig")- so you runs different code and it still useutf-8-sig