I want to create a table in the duckdb database from mongo collection in python, for further analytics. Now I do the following:
- dump mongo collection as jsonl to disk (single file)
- open duckdb connection and load jsonl file into a table
with open(f"mongo_json.jsonl", "w") as file:
json.dump(list(mongo_cursor), file, default=str)
duckdb.sql(f"CREATE OR REPLACE TABLE mongo_table AS SELECT *, FROM read_json_auto('mongo_json.jsonl', IGNORE_ERRORS=true)")
But the thing is the json is really big, which increase the memory consumption. So Are there any ideas or better approach to achieve this ?