3

I'm using postgresql 11.1-1 on Windows 10. On my system, if I run a query like the following, I will get a different number for each random column:

postgres=# SELECT random() as sortOrder, random() as col2;
     sortorder     |       col2
-------------------+-------------------
 0.607938482426107 | 0.121234225574881
(1 row)

But if I add an ORDER BY clause, as shown below, then random returns the same random number each time for every row.

postgres=# SELECT random() as sortOrder, random() as col2 
               FROM generate_series(0,1) 
               ORDER BY sortOrder;
     sortorder     |       col2
-------------------+-------------------
 0.100375576410443 | 0.100375576410443
 0.170669795479625 | 0.170669795479625
(2 rows)

I'm assuming there is a technical reason for this, but is it possible in this last query to get a different random value for each column?

Edit: I may have oversimplified in my original question. The actual query I was attempting was more like the following:

SELECT column1, random(), column2, random()
               FROM table1
               ORDER BY random();

So, having an answer that can be combined with a table query is also important.

2 Answers 2

3

random() is volatile, and the doc says about volatility:

A query using a volatile function will re-evaluate the function at every row where its value is needed.

Based on this definition, the engine is correct in evaluating the function only once per row, not once per column for the same row.

To turn off this optimization, you might tweak the query for the optimizer to miss the fact that the two columns carry the same expression. For instance:

postgres=> SELECT random()+0 as sortOrder, random() as col2 
               FROM generate_series(0,1) 
               ORDER BY sortOrder;
     sortorder      |        col2         
--------------------+---------------------
 0.0412265718914568 | 0.00992022315040231
  0.841067179106176 |   0.818967831321061
(2 rows)

Now, that simple solution may seem a bit fragile. In some future version, the optimizer might theorically become smart enough to recognize that these two expressions have a common sub-expression that it can reuse.

A really robust solution would be to generate the columns in different rows to start with, in order to force the optimizer to use distinct function calls no matter how hard it tries to avoid the multiple evaluations. For instance:

SELECT v1,v2 FROM
(select random() as v1, row_number() over() as r1
 FROM generate_series(0,1)) as s1
JOIN
(select random() as v2, row_number() over() as r2
 FROM generate_series(0,1)) as s2
ON s1.r1=s2.r2
ORDER BY 1;
        v1         |        v2         
-------------------+-------------------
 0.145532250870019 | 0.963712519966066
 0.277611976023763 | 0.252654927782714
(2 rows)
2
  • select random(), random() returns a different value for each column. Would it be fair to say that this mean Postgres is breaking the contract for volatile? Commented Feb 1, 2022 at 22:51
  • 1
    @nicstella: I don't think Postgres makes any promise about reusing or not reusing the evaluation for columns of the same row. Commented Feb 2, 2022 at 14:01
1

I ended up using another approach, inspired by Daniel Vérité's answer, because I had trouble integrating his answer with a rather large query I had written. So, I ended up creating a postgresql function called true_random, with the following code:

select random() as v1
    FROM generate_series(0,1)

Then I replaced every instance of random() with true_random(). This seems to have solved the problem. (Although I still gave Daniel credit, because he answered the question I had originally asked).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.