I've started to experiment with duckdb but have struggled to figure out, how we can release memory again. If I have a loop like below, I would have imagined that memory gets either freed after process is over or after con.close() was called. But on my system neither seems to be the case, if I look at the memory used by the process, there is still memory occupied (very likely by duckdb, as this is essentially all I'm doing. Unless deltalake or pyarrow have issues)
import duckdb
import deltalake
for src in ["a","b", "c"]:
def process():
con = duckdb.connect(":memory:")
dt = deltalake.DeltaTable(f"s3a://{src}", storage_options=deltalake_storage_options)
pa = dt.to_pyarrow_table()
r1 = con.from_arrow(pa)
duckdb.sql("select * from r1").write_parquet("/tmp/test.parquet")
process()
As some of the tables I'm dealing with are very large, I cannot keep all of them in memory at the same time. So how would I be able to release all of the memory, that is allocated to duckdb?