0

Given the following:

import io
buffer = io.BytesIO()
csv_data = 'col1,col2\n1,2\n3,4`

I want to know how I can use duckdb ( https://duckdb.org/docs/data/parquet/overview.html ) to write a parquet file to the buffer in memory, where file will contain the column/row data from the csv_data variable.

I'm using duckdb version 0.7.1 (I'm not fixed to this version though).

edit

Suggested to try the following:

import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

Which failed with:


In [1]: import duckdb

In [2]: from io import BytesIO
   ...:

In [3]: csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
   ...:

In [4]: duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

TypeError: read_csv(): incompatible function arguments. The following argument types are supported:
    1. (name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) -> duckdb.DuckDBPyRelation

Invoked with: <_io.BytesIO object at 0x7f21ed64d620>; kwargs: header=True
2
  • 1
    This works in 0.8.0 Commented May 19, 2023 at 19:10
  • @jqurious thanks - I can confirm that this works in 0.8.0 Commented May 20, 2023 at 18:14

1 Answer 1

1

You can read it with read_csv and write it to parquet with write_parquet

import duckdb
from io import BytesIO
csv_data = BytesIO(b'col1,col2\n1,2\n3,4')
duckdb.read_csv(csv_data, header=True).write_parquet('csv_data.parquet')

Note - this does not work on version 0.7.1, but does work on 0.8.0

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks but that didn't work - I'll update the OP with the error I got from that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.