I'm working with a csv I'm fetching online with requests.get, so for context this is how the file is being uploaded:
import pandas as pd
import requests
comments = []
body = requests.get()
for comment in body:
comments.append([
str(body['data']['body']).encode(encoding='utf-8')
])
df = pd.DataFrame(comments)[0]
requests.put('http://sample/desination.csv', data=df.to_csv(index=False))
The encoding when appending to comments is required when using requests because it defaulted to latin-1 and requests is expecting utf-8.
The resulting csv contains 1 column with rows like: b'Presicely'
Makes sense, encoding to utf-8 converted the string to bytes type.
Now where I'm later trying to decode the csv I have the following:
import requests
data = requests.get('http://destination.csv').content
testdata = data.decode('utf-8').splitlines()
print(testdata[2])
b'Presicely'
If I don't decode:
print(data[1:20])
b'Presicely'\r\n
I was under the impression that decoding data would eliminate the b prefixes, as most stackoverflow answers suggest. The problem could be with how I initially upload the csv, so I've tried tackling that a few different ways with no luck (can't get around encoding it).
Any suggestions?
P.S. python version 3.7.7
Edit: I ended up having no luck trying to get this to work. DataFrame.to_csv() returns a string and as lenz pointed out the conversion to string type is likely the culprit of the issue.
Ultimately I ended up saving the data as a .txt to eliminate the need to call to_csv(), which led to my decode to work as expected confirming our suspicion. The txt file format works for me so I'm keeping it that way.
strcall somewhere, so the values really are"b'Precisely'"and"b'Precisely'\r\n".df.to_csv(encoding='utf-8')?