Pandas read byte string from csv

Question

I have a pandas dataframe which has byte strings as elements in a column: E.g. b'hey'.

When I write this dataframe to a csv and read if afterwards, pandas will return a string with the following form "b'hey'". This is a problem, because when calling tf.data.Dataset.from_tensor_slices the string will be casted to a byte string again and will have the following form: b"b'hey'". Specifying the dtype when reading the csv with dtype = {"COLUMN_NAME":bytes} didn't to anything.

Has anyone a solution to this without manually changing the string and removing the b?

Does this answer your question? How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x? — RJ Adriaansen
– RJ Adriaansen, Commented Nov 12, 2021 at 20:21

user3786340 · Accepted Answer · 2022-11-08 12:54:25Z

0

The solution is to apply ast.literal_eval first before decode with 'utf-8'.

To read and convert whole column with byte string:

import pandas as pd
import ast
df = pd.read_csv(<YOUR_DATA_FILE>, sep='\t')
df['text'].apply(ast.literal_eval) # assume the column is named with 'text'
df['text'] = df['text'].apply(lambda x: ast.literal_eval(x).decode("utf-8"))

Collectives™ on Stack Overflow

Pandas read byte string from csv

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related