3

I would like to upload csv as parquet file to S3 bucket. Below is the code snippet.

df = pd.read_csv('right_csv.csv')
csv_buffer = BytesIO()
df.to_parquet(csv_buffer, compression='gzip', engine='fastparquet')
csv_buffer.seek(0)

Above is giving me an error: TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO How to make it work?

1
  • If the answers solved your issue, kindly accept the one that was the most useful. If not do let know what the answers missed as a comment. More info on what to do when someone answers -> stackoverflow.com/help/someone-answers Commented Jan 27, 2022 at 8:21

2 Answers 2

3

As per the documentation, when fastparquet is used as the engine, io.BytesIO cannot be used. auto or pyarrow engine have to be used. Quoting from the documentation.

The engine fastparquet does not accept file-like objects.

Below code works without any issues.

import io
f = io.BytesIO()
df.to_parquet(f, compression='gzip', engine='pyarrow')
f.seek(0)
Sign up to request clarification or add additional context in comments.

Comments

2

As mentioned in the other answer, this is not supported. One work around would be to save as parquet to a NamedTemporaryFile. Then copy the content to a BytesIO buffer:


import tempfile

with tempfile.NamedTemporaryFile() as tmp:
    df.to_parquet(tmp.name, compression='gzip', engine='fastparquet')
    with open(tmp.name, 'rb') as fh:
        buf = io.BytesIO(fh.read())
        

1 Comment

@Simon It is the name of the temporary parquet file created. It is an attribute of the tmp object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.