postgresql random function returns duplicate results when order by random used in query

Question

I'm using postgresql 11.1-1 on Windows 10. On my system, if I run a query like the following, I will get a different number for each random column:

postgres=# SELECT random() as sortOrder, random() as col2;
     sortorder     |       col2
-------------------+-------------------
 0.607938482426107 | 0.121234225574881
(1 row)

But if I add an ORDER BY clause, as shown below, then random returns the same random number each time for every row.

postgres=# SELECT random() as sortOrder, random() as col2 
               FROM generate_series(0,1) 
               ORDER BY sortOrder;
     sortorder     |       col2
-------------------+-------------------
 0.100375576410443 | 0.100375576410443
 0.170669795479625 | 0.170669795479625
(2 rows)

I'm assuming there is a technical reason for this, but is it possible in this last query to get a different random value for each column?

Edit: I may have oversimplified in my original question. The actual query I was attempting was more like the following:

SELECT column1, random(), column2, random()
               FROM table1
               ORDER BY random();

So, having an answer that can be combined with a table query is also important.

Daniel Vérité · Accepted Answer · 2019-03-02 14:05:37Z

random() is volatile, and the doc says about volatility:

A query using a volatile function will re-evaluate the function at every row where its value is needed.

Based on this definition, the engine is correct in evaluating the function only once per row, not once per column for the same row.

To turn off this optimization, you might tweak the query for the optimizer to miss the fact that the two columns carry the same expression. For instance:

postgres=> SELECT random()+0 as sortOrder, random() as col2 
               FROM generate_series(0,1) 
               ORDER BY sortOrder;
     sortorder      |        col2         
--------------------+---------------------
 0.0412265718914568 | 0.00992022315040231
  0.841067179106176 |   0.818967831321061
(2 rows)

Now, that simple solution may seem a bit fragile. In some future version, the optimizer might theorically become smart enough to recognize that these two expressions have a common sub-expression that it can reuse.

A really robust solution would be to generate the columns in different rows to start with, in order to force the optimizer to use distinct function calls no matter how hard it tries to avoid the multiple evaluations. For instance:

SELECT v1,v2 FROM
(select random() as v1, row_number() over() as r1
 FROM generate_series(0,1)) as s1
JOIN
(select random() as v2, row_number() over() as r2
 FROM generate_series(0,1)) as s2
ON s1.r1=s2.r2
ORDER BY 1;
        v1         |        v2         
-------------------+-------------------
 0.145532250870019 | 0.963712519966066
 0.277611976023763 | 0.252654927782714
(2 rows)

select random(), random() returns a different value for each column. Would it be fair to say that this mean Postgres is breaking the contract for volatile? — nicstella
– nicstella, Commented Feb 1, 2022 at 22:51
@nicstella: I don't think Postgres makes any promise about reusing or not reusing the evaluation for columns of the same row. — Daniel Vérité
– Daniel Vérité, Commented Feb 2, 2022 at 14:01

TMorgan · Accepted Answer · 2019-03-04 15:06:36Z

1

I ended up using another approach, inspired by Daniel Vérité's answer, because I had trouble integrating his answer with a rather large query I had written. So, I ended up creating a postgresql function called true_random, with the following code:

select random() as v1
    FROM generate_series(0,1)

Then I replaced every instance of random() with true_random(). This seems to have solved the problem. (Although I still gave Daniel credit, because he answered the question I had originally asked).

answered Mar 4, 2019 at 15:06

TMorgan

1431 silver badge6 bronze badges

Add a comment |

Stack Exchange Network

postgresql random function returns duplicate results when order by random used in query

2 Answers 2

Your Answer

Hot Network Questions

postgresql random function returns duplicate results when order by random used in query

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions