1

I'm working with a csv I'm fetching online with requests.get, so for context this is how the file is being uploaded:

import pandas as pd
import requests

comments = []
body = requests.get()
for comment in body:
    comments.append([
                str(body['data']['body']).encode(encoding='utf-8')
            ])
df = pd.DataFrame(comments)[0]
requests.put('http://sample/desination.csv', data=df.to_csv(index=False))

The encoding when appending to comments is required when using requests because it defaulted to latin-1 and requests is expecting utf-8.

The resulting csv contains 1 column with rows like: b'Presicely'

Makes sense, encoding to utf-8 converted the string to bytes type.

Now where I'm later trying to decode the csv I have the following:

import requests

data = requests.get('http://destination.csv').content
testdata = data.decode('utf-8').splitlines()
print(testdata[2])

b'Presicely'

If I don't decode:

print(data[1:20])

b'Presicely'\r\n

I was under the impression that decoding data would eliminate the b prefixes, as most stackoverflow answers suggest. The problem could be with how I initially upload the csv, so I've tried tackling that a few different ways with no luck (can't get around encoding it).

Any suggestions?

P.S. python version 3.7.7

Edit: I ended up having no luck trying to get this to work. DataFrame.to_csv() returns a string and as lenz pointed out the conversion to string type is likely the culprit of the issue.

Ultimately I ended up saving the data as a .txt to eliminate the need to call to_csv(), which led to my decode to work as expected confirming our suspicion. The txt file format works for me so I'm keeping it that way.

7
  • Probably there's an (implicit) str call somewhere, so the values really are "b'Precisely'" and "b'Precisely'\r\n". Commented Jul 30, 2020 at 7:10
  • By serialising a list of bytes objects (rather than first serialising, then encoding the whole dump), you probably need to also decode each cell individually too. Commented Jul 30, 2020 at 7:11
  • df.to_csv(encoding='utf-8')? Commented Jul 30, 2020 at 8:20
  • @snakecharmerb just tried doing this both with/without decoding the body but the results were the same. Commented Jul 30, 2020 at 12:19
  • @lenz you're right in that to_csv returns a str object, so that may be where the problem lies. However when I try to decode the entire body as such: datadf = pd.read_csv(io.StringIO(data.decode('utf-8'))) I can then fetch a cell: testdata = datadf.iloc[1,0] but then that cell is already a string which can't be further decoded. Are you suggesting I convert it to another type to decode it further, on each row? Commented Jul 30, 2020 at 12:28

1 Answer 1

1

I was able to get this to work, credit to my irl friend who rubber ducked me through the solution. It was quite simple, what I needed to do was encode the resulting string from to_csv function like so:

comments = []
body = requests.get()
for comment in body:
    comments.append([
            str(body['data']['body'])
        ])
df = pd.DataFrame(comments)[0]
csv_data = df.to_csv(index=False)
csv_data = csv_data.encode('utf-8')
requests.put('http://sample/desination.csv', data=csv_data)

I'm sure you can compress the above code by combining encode to either to_csv function as a flag or applying it to the result.

The resulting file uploaded can now be decoded properly and you can keep your csv format.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.