Problem: The function (below, in PostgreSQL 9.3) runs fine with few iterations, but with many iterations it writes a file of ~1 GB on the disk each iteration until the disk is full and then the code terminates with failed to write.
Question: Is there a way to not write these files on the disk? Or find some other way to circumvent the problem? Ideally I would like to put the code to run overnight to analyse results next day.
The tables are supposed to be overwritten every iteration, so I don't understand why it fills my disk. In my previous attempts it also ran out of memory but I increased max_locks_per_transaction = 256 from 64 in the postgresql.conf.
What am I doing:
I have a function that gets parameters that control the loops inside: start and end timestamp, time bin delta time span and time jump. Something like this: SELECT ib_run2('2009-06-28 13:30:00', '2009-06-29 13:50:59', '10 minute', '0.5 hour', '24 hour');
So the function divides time between start and stop into bins, in this example time from 2009-06-28 13:30:00 is divided into 10 minute intervals for 0.5 hour then jumps 24 hour and does that again until 2009-06-29 13:50:59.
For each 10-minute bin some calculations are made on a spatiotemporal dataset including selection by time and location and calculated distances.
Inside the function there is an unavoidable sequential scan of a big table (6,154,794 rows) and of several smaller ones with selection of a subset from each. The function performs calculations on these subsets and writes results into created tables.
All tables are created with CREATE TABLE. Tables starting with IB_000_ are created before the loops and updated with INSERT INTO inside the loops. Tables starting with IB_i_ are dropped and recreated within the loops each iteration.
Calculation of tables with IB_i_ involve other tables with IB_i_ created within the same iteration or external tables for calculations.
The function:
CREATE OR REPLACE FUNCTION ib_run2(
start_dt TEXT DEFAULT '2009-06-28 13:30:00'
, end_dt TEXT DEFAULT '2009-06-28 13:59:59'
, deltat TEXT DEFAULT '10 minute'
, spant TEXT DEFAULT '2 hour'
, jump_txt TEXT DEFAULT '24 hour'
) RETURNS TEXT AS
$func$
DECLARE n INT DEFAULT 1; DECLARE m INT DEFAULT 1; DECLARE iteration INT DEFAULT 0;
DECLARE delta INTERVAL; DECLARE span INTERVAL; DECLARE jump INTERVAL;
DECLARE mytext TEXT DEFAULT 'iMarinka';
DECLARE start_time_query TIMESTAMP DEFAULT now();
DECLARE dt0 TIMESTAMP;
DECLARE dt1 TIMESTAMP;
DECLARE dt TIMESTAMP;
BEGIN
dt0:=start_dt :: TIMESTAMP;
dt1:=end_dt :: TIMESTAMP;
delta:=deltat :: INTERVAL;
span:=spant :: INTERVAL;
jump:=jump_txt :: INTERVAL;
iteration:=0;
n:=ceiling(extract(EPOCH FROM (dt1-dt0) )*1.0/extract(EPOCH FROM (jump ) ));
m:=ceiling(extract(EPOCH FROM ( (dt0+span) -dt0) )*1.0/extract(EPOCH FROM (delta) ));
DROP TABLE IF EXISTS IB_000_times;
CREATE TABLE IB_000_times (
gid serial primary key, i INT, j INT
, t_from_v TIMESTAMP, t_to_v TIMESTAMP
, t_from_c TIMESTAMP, t_to_c TIMESTAMP
, t_day TEXT, date_t DATE, t TIME
, delta_t INTERVAL
, dt0 TIMESTAMP, dt1 TIMESTAMP
, dt TIMESTAMP, delta INTERVAL, span INTERVAL , jump INTERVAL );
mytext:=(m+1)*(n+1)||' iterations '||n+1||' of i '||m+1||' of j'; RAISE NOTICE '%', mytext;
FOR i IN 0..n LOOP -----------------------------------------
FOR j IN 0..m LOOP -----------------------------------------
dt := dt0 + j * delta + i * jump;
iteration := iteration + 1;
DROP TABLE IF EXISTS IB_i_times;
CREATE TABLE IB_i_times AS (
WITH a AS (SELECT dt::DATE date_t, dt::TIME t , delta delta_t)
SELECT date_t+ t - delta_t AS t_from_v
, date_t+ t AS t_to_v
, date_t+ t AS t_from_c
, date_t+ t + delta_t AS t_to_c
, to_char(date_t, 'day') AS t_day
, a.date_t , a.t, a.delta_t
FROM a
);
INSERT INTO IB_000_times (i , j,
t_from_v , t_to_v , t_from_c , t_to_c , t_day , date_t , t , delta_t ,
dt0 , dt1 , delta , span , jump , dt)
SELECT i,j, t.t_from_v, t.t_to_v, t.t_from_c, t.t_to_c, t.t_day, t.date_t, t.t, t.delta_t ,
dt0 , dt1 , delta , span , jump , dt
FROM IB_i_times t;
COPY ( select * FROM IB_000_times ) TO '/Volumes/1TB/temp/IB_000_times.csv' CSV HEADER DELIMITER ';' ;
mytext := iteration||'/'||(n+1)*(m+1)||' -----> '||' dt= '||to_char(dt,'YYYY-MM-DD HH24:MI:SS'); RAISE NOTICE '%', mytext;
mytext := 'Fin '||': i='||i||', dt='|| to_char(dt,'YYYY-MM-DD HH24:MI:SS')||', started '||start_time_query;
END LOOP;----------------------------------------------------
END LOOP;----------------------------------------------------
RETURN mytext;
END;
$func$
LANGUAGE plpgsql;
Besides the tables IB_i_times and IB_000_times there is a bunch of other tables (not shown here to save space, the code has ~500 lines) that the function creates before (and some inside) the loops and updates inside the loops.
withclause) can generate temp files (does materialization of subqueries, when subqueries are larger than work_mem).WITHclauses in several tables. but shouldn't it delete them when the iteration is over?TEMPtables