Why postgresql writes huge temporary files and fills my disk within a loop?

Question

Problem: The function (below, in PostgreSQL 9.3) runs fine with few iterations, but with many iterations it writes a file of ~1 GB on the disk each iteration until the disk is full and then the code terminates with failed to write.

Question: Is there a way to not write these files on the disk? Or find some other way to circumvent the problem? Ideally I would like to put the code to run overnight to analyse results next day.

The tables are supposed to be overwritten every iteration, so I don't understand why it fills my disk. In my previous attempts it also ran out of memory but I increased max_locks_per_transaction = 256 from 64 in the postgresql.conf.

What am I doing:

I have a function that gets parameters that control the loops inside: start and end timestamp, time bin delta time span and time jump. Something like this: SELECT ib_run2('2009-06-28 13:30:00', '2009-06-29 13:50:59', '10 minute', '0.5 hour', '24 hour');

So the function divides time between start and stop into bins, in this example time from 2009-06-28 13:30:00 is divided into 10 minute intervals for 0.5 hour then jumps 24 hour and does that again until 2009-06-29 13:50:59.

For each 10-minute bin some calculations are made on a spatiotemporal dataset including selection by time and location and calculated distances.

Inside the function there is an unavoidable sequential scan of a big table (6,154,794 rows) and of several smaller ones with selection of a subset from each. The function performs calculations on these subsets and writes results into created tables.

All tables are created with CREATE TABLE. Tables starting with IB_000_ are created before the loops and updated with INSERT INTO inside the loops. Tables starting with IB_i_ are dropped and recreated within the loops each iteration.

Calculation of tables with IB_i_ involve other tables with IB_i_ created within the same iteration or external tables for calculations.

The function:

CREATE OR REPLACE FUNCTION ib_run2(
        start_dt TEXT DEFAULT '2009-06-28 13:30:00'
      , end_dt   TEXT DEFAULT '2009-06-28 13:59:59'
      , deltat   TEXT DEFAULT '10 minute'
      , spant    TEXT DEFAULT '2 hour'
      , jump_txt TEXT DEFAULT '24 hour'
  ) RETURNS TEXT AS
$func$
      DECLARE n INT DEFAULT 1; DECLARE m INT DEFAULT 1; DECLARE iteration INT DEFAULT 0;
      DECLARE delta INTERVAL; DECLARE span INTERVAL; DECLARE jump INTERVAL;
      DECLARE mytext TEXT DEFAULT 'iMarinka';
      DECLARE start_time_query TIMESTAMP DEFAULT now();
      DECLARE dt0 TIMESTAMP;
      DECLARE dt1 TIMESTAMP;
      DECLARE dt TIMESTAMP;
BEGIN
    dt0:=start_dt :: TIMESTAMP;
    dt1:=end_dt :: TIMESTAMP;
    delta:=deltat :: INTERVAL;
    span:=spant :: INTERVAL;
    jump:=jump_txt :: INTERVAL;
    iteration:=0;

    n:=ceiling(extract(EPOCH FROM (dt1-dt0)          )*1.0/extract(EPOCH FROM (jump ) ));
    m:=ceiling(extract(EPOCH FROM ( (dt0+span) -dt0) )*1.0/extract(EPOCH FROM (delta) ));

    DROP TABLE IF EXISTS IB_000_times;
    CREATE TABLE IB_000_times (
              gid serial primary key, i INT, j INT
            , t_from_v TIMESTAMP, t_to_v TIMESTAMP
            , t_from_c TIMESTAMP, t_to_c TIMESTAMP
            , t_day TEXT, date_t DATE, t TIME
            , delta_t INTERVAL
            , dt0 TIMESTAMP, dt1 TIMESTAMP
            , dt TIMESTAMP, delta INTERVAL, span INTERVAL , jump INTERVAL );

    mytext:=(m+1)*(n+1)||' iterations '||n+1||' of i '||m+1||' of j'; RAISE NOTICE '%', mytext;

    FOR i IN 0..n LOOP  -----------------------------------------
    FOR j IN 0..m LOOP  -----------------------------------------

      dt := dt0 + j * delta + i * jump;
      iteration := iteration + 1;

      DROP TABLE IF EXISTS IB_i_times; 
      CREATE TABLE IB_i_times AS (
        WITH a AS (SELECT dt::DATE date_t, dt::TIME t , delta delta_t)
        SELECT date_t+ t - delta_t AS t_from_v
          , date_t+ t AS t_to_v
          , date_t+ t AS t_from_c
          , date_t+ t + delta_t AS t_to_c
          , to_char(date_t, 'day') AS t_day
          , a.date_t , a.t, a.delta_t
        FROM a
      );

      INSERT INTO IB_000_times (i , j,
          t_from_v , t_to_v , t_from_c , t_to_c , t_day , date_t , t , delta_t ,
          dt0 , dt1 , delta , span , jump , dt)
      SELECT i,j, t.t_from_v, t.t_to_v, t.t_from_c, t.t_to_c, t.t_day, t.date_t, t.t, t.delta_t ,
            dt0 , dt1 , delta , span , jump , dt
      FROM IB_i_times t;

      COPY ( select * FROM IB_000_times ) TO '/Volumes/1TB/temp/IB_000_times.csv' CSV HEADER DELIMITER ';' ;

      mytext := iteration||'/'||(n+1)*(m+1)||' -----> '||'   dt= '||to_char(dt,'YYYY-MM-DD HH24:MI:SS'); RAISE NOTICE '%', mytext;
      mytext := 'Fin '||': i='||i||', dt='|| to_char(dt,'YYYY-MM-DD HH24:MI:SS')||', started  '||start_time_query;

    END LOOP;----------------------------------------------------
    END LOOP;----------------------------------------------------
  RETURN mytext;
END;
$func$
LANGUAGE plpgsql;

Besides the tables IB_i_times and IB_000_times there is a bunch of other tables (not shown here to save space, the code has ~500 lines) that the function creates before (and some inside) the loops and updates inside the loops.

It looks like it is possible to replace the procedural code with relational. Just post sample data and desired output. — Clodoaldo Neto
– Clodoaldo Neto, Commented Mar 17, 2017 at 17:07
if you use temp table, than you can save lot of writes to transaction log. CTE (with clause) can generate temp files (does materialization of subqueries, when subqueries are larger than work_mem). — Pavel Stehule
– Pavel Stehule, Commented Mar 17, 2017 at 18:46
i do have WITH clauses in several tables. but shouldn't it delete them when the iteration is over? — Lizardie
– Lizardie, Commented Mar 17, 2017 at 18:47
@ClodoaldoNeto i have put sample csv files (small ones) and code here: filehosting.org/file/details/650149/Archive.zip — Lizardie
– Lizardie, Commented Mar 17, 2017 at 22:26

Pavel Stehule · Accepted Answer · 2017-03-17 18:40:27Z

3

Hard to say, why Postgres generate temp files from this source code. Use log temp files - log_temp_files - and when you identify the statements that produce temp files, you can identify a reason. Usually it is limited work_mem.

Controls logging of temporary file names and sizes. Temporary files can be created for sorts, hashes, and temporary query results. A log entry is made for each temporary file when it is deleted. A value of zero logs all temporary file information, while positive values log only files whose size is greater than or equal to the specified number of kilobytes. The default setting is -1, which disables such logging. Only superusers can change this setting. https://www.postgresql.org/docs/current/static/runtime-config-logging.html

answered Mar 17, 2017 at 18:40

Pavel Stehule

46.7k6 gold badges103 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why postgresql writes huge temporary files and fills my disk within a loop?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related