Achieving window function-like behavior using a PostgreSQL user defined function?

Question

Let's say that given a table observations_tbl with attributes date (day) and value, I want to produce the new attribute prev_day_value to get the following table:

|---------------------|-------|----------------|
|     date            | value | prev_day_value |
|---------------------|-------|----------------|
| 01.01.2015 00:00:00 | 5     | 0              |
| 02.01.2015 00:00:00 | 4     | 5              |
| 03.01.2015 00:00:00 | 3     | 4              |
| 04.01.2015 00:00:00 | 2     | 3              |
|---------------------|-------|----------------|

I am well-aware that such an output can typically be obtained using a WINDOW function. But how would I achieve this through a PostgreSQL user defined function? I want to indicate that I am in a situation where I must use a function, difficult to explain why without going into detail - these are the restrictions I have and if anything, it is a technical challenge.

Take into consideration this template query:

SELECT *, lag(value,1) AS prev_day_value -- or lag(record,1) or lag(date,value,1) or lag(date,1) or lag(observations_tbl,1), etc.
FROM observations_tbl

I am using function lag with parameter 1 to look for a value which comes before the current row by 1 - a distance of 1 row. I don't care what other parameters the function lag can have (table name, other attributes) - what could the function lag look like to achieve such functionality? The function can be of any language, SQL, PL/pgSQL and even C using PostgreSQL API/backend.

I understand that one answer can be wrapping a WINDOW query inside lag user defined function. But I am thinking that would be a rather costly operation if I have to scan the entire table twice (once inside the lag function and once outside). I was thinking that maybe each PostgreSQL record would have a pointer to its previous record which is directly accessible? Or that I can somehow open a cursor at this specific row / row number without having to scan the entire table? Or is what I am asking impossible?

Trying to replicate the efficiency of built-in functions with user-defined functions seems -- shall I say -- challenging. — Gordon Linoff
– Gordon Linoff, Commented Mar 1, 2020 at 4:06
Are you sure that PostgreSQL has no way to address this requirement? — Zeruno
– Zeruno, Commented Mar 1, 2020 at 4:09
. . No, I do not know. You could write a wrapper around the internal definition. But something is amiss with the need to do this. It seems like an X-Y problem, where you are focusing on the wrong detail. — Gordon Linoff
– Gordon Linoff, Commented Mar 1, 2020 at 4:12
You can inline the functionality into your stored procedure(s) until you find a better way to orient your data :/ — Ross Bush
– Ross Bush, Commented Mar 1, 2020 at 4:19
If you "must" use a function for some bizarre reasons, then why not simply wrap a query using a window function into a SQL function? — user330315
– user330315, Commented Mar 2, 2020 at 7:24

Erwin Brandstetter · Accepted Answer · 2020-03-03 01:48:30Z

2

Your request is not possible to solve with relational tools (window functions are not relational extension in SQL). In C language you can write own alternative of function lag. You can do same work in PL8 language (Javascript). Unfortunately the API for window functions doesn't exist for PL/pgSQL. You cannot to write simple PL/pgSQL function that has access to different row than is processed.

The one possible alternative (but with some performance risk) is writing table function. There you have a control over all processed dataset, and you can do this operation simply.

CREATE OR REPLACE FUNCTION report()
RETURNS TABLE(d date, v int, prev_v int) $$
DECLARE r RECORD;
BEGIN
  prev_v := 0;
  FOR r IN SELECT date, value FROM observations_tbl t ORDER BY 1
  LOOP
    d := r.date; v := r.value;
    RETURN NEXT;
    prev_v := v;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

There is not any other alternative usable solution. In very old date these values was calculated with correlated selfjoins, but this solution has pretty terrible performance.

edited Mar 3, 2020 at 1:48

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

answered Mar 2, 2020 at 5:48

Pavel Stehule

46.7k6 gold badges104 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user330315 Over a year ago

What exactly do you mean with "window functions are not relational extension in SQL"?

Pavel Stehule Over a year ago

Relational algebra is based on a sets. The order of field inside set is not important. You can do some operations over these sets - union, multiplication, subset, ... The window function are based on different fundaments - there the order is important.

Belayer Over a year ago

The contention that windowed functions is not a relational extension to SQL is just plain wrong - see ISO SQL:2003 and ISO SQL:2008. You are correct that Dr Codd's original 1970 paper (see: Communications of the ACM, volume 13, number 6. 377-387) did not contain them, it was on storage and retrieval. Further window functions are all Post Retrieval so your relational algebra is finished. Are you contending also that ORDER BY is not valid SQL or other normal operations.

Zeruno Over a year ago

@PavelStehule in your example, you specified the return type. I was wondering if there is a way to specify the return type within the body of the function?

Pavel Stehule Over a year ago

@Zeruno - return type should be defined in interface only and in planning time. Postgres is type strict system.

|

Erwin Brandstetter · Accepted Answer · 2020-03-03 01:49:11Z

1

What Pavel posted, just with fewer assignments. Should be faster:

CREATE OR REPLACE FUNCTION report()
  RETURNS TABLE(d date, v int, prev_v int) AS
$func$
BEGIN
   prev_v := 0;
   FOR d, v IN
      SELECT date, value FROM observations_tbl ORDER BY 1
   LOOP
      RETURN NEXT;
      prev_v := v;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

The general idea can pay if it actually replaces multiple scans over the table with a single one. Like here:

GROUP BY and aggregate sequential numeric values

answered Mar 3, 2020 at 1:49

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Achieving window function-like behavior using a PostgreSQL user defined function?

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related