I need to execute a test in which I have to simulate 20 years' historical data in PostgreSQL (and TimescaleDB) DB. My problem is that I have to generate .sql files and ingest them (using psql-client) in the targeted DB.
We made some calculations and the result is that our database will have 261 billion rows in our table for 20 years, so each year contains 13.05B data.
For each row, we have a timestamp (integer type) and I thought, to be more efficient, I write, in my .sql files transactions of 10.000 elements. To make the generated .sql files small in disk space (I generate those files in python), I limited each file to 20M rows each.
So I thought that I could generate these files dynamically in a bash file and, when a file is generated, I run a psql command to ingest it into the DB, but the problem is that I don't know how to handle it: The ingest process takes much more time than .sql file generation, so in bash commands, I am afraid my bash script will wait the ingest process before starting to generate a new .sql file and execute the ingest process again.
So to summarize, I try to create an ingest process pseudo-batch based in which each generated .sql file that has been ingested successfully will be removed to avoid to take too much disk space.
How to avoid the fact it will wait the ingest process before starting an other .sql file generation and then start the ingest process?
This is for development purpose, these data I want to ingest are close to the one we would like to create in production mode. The aim, for now, is to execute read requests and compare those request in PostgreSQL and TimescaleDB.