0

Let's say that given a table observations_tbl with attributes date (day) and value, I want to produce the new attribute prev_day_value to get the following table:

|---------------------|-------|----------------|
|     date            | value | prev_day_value |
|---------------------|-------|----------------|
| 01.01.2015 00:00:00 | 5     | 0              |
| 02.01.2015 00:00:00 | 4     | 5              |
| 03.01.2015 00:00:00 | 3     | 4              |
| 04.01.2015 00:00:00 | 2     | 3              |
|---------------------|-------|----------------|

I am well-aware that such an output can typically be obtained using a WINDOW function. But how would I achieve this through a PostgreSQL user defined function? I want to indicate that I am in a situation where I must use a function, difficult to explain why without going into detail - these are the restrictions I have and if anything, it is a technical challenge.

Take into consideration this template query:

SELECT *, lag(value,1) AS prev_day_value -- or lag(record,1) or lag(date,value,1) or lag(date,1) or lag(observations_tbl,1), etc.
FROM observations_tbl

I am using function lag with parameter 1 to look for a value which comes before the current row by 1 - a distance of 1 row. I don't care what other parameters the function lag can have (table name, other attributes) - what could the function lag look like to achieve such functionality? The function can be of any language, SQL, PL/pgSQL and even C using PostgreSQL API/backend.

I understand that one answer can be wrapping a WINDOW query inside lag user defined function. But I am thinking that would be a rather costly operation if I have to scan the entire table twice (once inside the lag function and once outside). I was thinking that maybe each PostgreSQL record would have a pointer to its previous record which is directly accessible? Or that I can somehow open a cursor at this specific row / row number without having to scan the entire table? Or is what I am asking impossible?

8
  • Trying to replicate the efficiency of built-in functions with user-defined functions seems -- shall I say -- challenging. Commented Mar 1, 2020 at 4:06
  • Are you sure that PostgreSQL has no way to address this requirement? Commented Mar 1, 2020 at 4:09
  • 2
    . . No, I do not know. You could write a wrapper around the internal definition. But something is amiss with the need to do this. It seems like an X-Y problem, where you are focusing on the wrong detail. Commented Mar 1, 2020 at 4:12
  • You can inline the functionality into your stored procedure(s) until you find a better way to orient your data :/ Commented Mar 1, 2020 at 4:19
  • If you "must" use a function for some bizarre reasons, then why not simply wrap a query using a window function into a SQL function? Commented Mar 2, 2020 at 7:24

2 Answers 2

2

Your request is not possible to solve with relational tools (window functions are not relational extension in SQL). In C language you can write own alternative of function lag. You can do same work in PL8 language (Javascript). Unfortunately the API for window functions doesn't exist for PL/pgSQL. You cannot to write simple PL/pgSQL function that has access to different row than is processed.

The one possible alternative (but with some performance risk) is writing table function. There you have a control over all processed dataset, and you can do this operation simply.

CREATE OR REPLACE FUNCTION report()
RETURNS TABLE(d date, v int, prev_v int) $$
DECLARE r RECORD;
BEGIN
  prev_v := 0;
  FOR r IN SELECT date, value FROM observations_tbl t ORDER BY 1
  LOOP
    d := r.date; v := r.value;
    RETURN NEXT;
    prev_v := v;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

There is not any other alternative usable solution. In very old date these values was calculated with correlated selfjoins, but this solution has pretty terrible performance.

Sign up to request clarification or add additional context in comments.

6 Comments

What exactly do you mean with "window functions are not relational extension in SQL"?
Relational algebra is based on a sets. The order of field inside set is not important. You can do some operations over these sets - union, multiplication, subset, ... The window function are based on different fundaments - there the order is important.
The contention that windowed functions is not a relational extension to SQL is just plain wrong - see ISO SQL:2003 and ISO SQL:2008. You are correct that Dr Codd's original 1970 paper (see: Communications of the ACM, volume 13, number 6. 377-387) did not contain them, it was on storage and retrieval. Further window functions are all Post Retrieval so your relational algebra is finished. Are you contending also that ORDER BY is not valid SQL or other normal operations.
@PavelStehule in your example, you specified the return type. I was wondering if there is a way to specify the return type within the body of the function?
@Zeruno - return type should be defined in interface only and in planning time. Postgres is type strict system.
|
1

What Pavel posted, just with fewer assignments. Should be faster:

CREATE OR REPLACE FUNCTION report()
  RETURNS TABLE(d date, v int, prev_v int) AS
$func$
BEGIN
   prev_v := 0;
   FOR d, v IN
      SELECT date, value FROM observations_tbl ORDER BY 1
   LOOP
      RETURN NEXT;
      prev_v := v;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

The general idea can pay if it actually replaces multiple scans over the table with a single one. Like here:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.