Error while converting csv to parquet file using pandas

Question

I would like to upload csv as parquet file to S3 bucket. Below is the code snippet.

df = pd.read_csv('right_csv.csv')
csv_buffer = BytesIO()
df.to_parquet(csv_buffer, compression='gzip', engine='fastparquet')
csv_buffer.seek(0)

Above is giving me an error: TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO How to make it work?

If the answers solved your issue, kindly accept the one that was the most useful. If not do let know what the answers missed as a comment. More info on what to do when someone answers -> stackoverflow.com/help/someone-answers — Kabilan Mohanraj
– Kabilan Mohanraj, Commented Jan 27, 2022 at 8:21

Kabilan Mohanraj · Accepted Answer · 2022-04-22 11:13:20Z

3

As per the documentation, when fastparquet is used as the engine, io.BytesIO cannot be used. auto or pyarrow engine have to be used. Quoting from the documentation.

The engine fastparquet does not accept file-like objects.

Below code works without any issues.

import io
f = io.BytesIO()
df.to_parquet(f, compression='gzip', engine='pyarrow')
f.seek(0)

edited Apr 22, 2022 at 11:13

answered Jan 26, 2022 at 9:10

Kabilan Mohanraj

1,9461 gold badge9 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

0x26res · Accepted Answer · 2022-01-26 10:00:45Z

2

As mentioned in the other answer, this is not supported. One work around would be to save as parquet to a NamedTemporaryFile. Then copy the content to a BytesIO buffer:


import tempfile

with tempfile.NamedTemporaryFile() as tmp:
    df.to_parquet(tmp.name, compression='gzip', engine='fastparquet')
    with open(tmp.name, 'rb') as fh:
        buf = io.BytesIO(fh.read())

edited Jan 26, 2022 at 10:00

answered Jan 26, 2022 at 9:44

0x26res

14.2k13 gold badges65 silver badges123 bronze badges

1 Comment

Kabilan Mohanraj Over a year ago

@Simon It is the name of the temporary parquet file created. It is an attribute of the tmp object.

Collectives™ on Stack Overflow

Error while converting csv to parquet file using pandas

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related