How can I generate random numbers that are unique in column using postgresql

Question

I want to generate random numbers in PostgreSQL just like I have done in MySQL like below. I want to do so in a Postgres function.

MySQL:

DROP PROCEDURE IF EXISTS Generate_random;
DELIMITER $$
CREATE PROCEDURE Generate_random()
BEGIN
    Drop table if exists aa_dev.`Agents`;
    CREATE TABLE aa_dev.`Agents`(AgentID int PRIMARY KEY);

    SET @first = 1;
    SET @last = 1000;

    WHILE(@first <= @last) Do
        INSERT INTO aa_dev.`Agents` VALUES(FLOOR(RAND()*(2900000-2800000+1)+2800000))
                                          ON DUPLICATE KEY UPDATE AgentID = FLOOR(RAND()*(2900000-2800000+1)+2800000);
        IF ROW_COUNT() = 1 THEN
            SET @first = @first + 1;
        END IF;
    END WHILE;
END$$


DELIMITER ;

CALL Generate_random();

I have so far generated random numbers in Postgres but they are getting repeated in the column. Please tell me how can I achieve the above MySQL code in PostgreSQL.

drop function if exists aa_dev.rand_cust(low INT, high INT, total INT);
CREATE OR REPLACE FUNCTION aa_dev.rand_cust(low INT ,high INT, total INT)
  RETURNS TABLE (Cust_id  int) AS
$$
declare

counter int := 0;
rand int := 0;


begin
------------------- Creating a customer table with Cust_id----------------------------
    DROP TABLE IF EXISTS aa_dev.Customer;

    CREATE TABLE IF NOT EXISTS aa_dev.Customer (
    Cust_id INT
    );
 --------------------- Loop to insert random -----------------------
    while counter < total loop
        rand = floor(random()* (high-low + 1) + low);
        Insert into aa_dev.Customer (Cust_id) values(rand);
        counter := counter + 1;
    end loop;

    return query
    select *
    from aa_dev.customer;
end
$$
LANGUAGE plpgsql;

select * from aa_dev.rand_cust(1, 50, 100);

mysql code is totally different from the postgresql code. at least, try to do the same code. furthermore, the random code in postgresql will never be unique since you generate 100 numbers, randomizing only 50. — Paulo Pereira
– Paulo Pereira, Commented Jan 7, 2021 at 15:15
@PauloPereira They both are different because I could not achieve the Mysql exact code in PostgreSQL and that's the point of posting the question. I tried On Conflict upsert but that did not work as it was giving some error. — Chloe
– Chloe, Commented Jan 7, 2021 at 16:52
I suggest you take a look at Migrate your mindset too. At a minimum your parameters should be (1, 100001, 100). Then you need to handle duplicates as a Postgres exception - not complain it not the same as mysql. Hint: put your insert in a nested block. — Belayer
– Belayer, Commented Jan 8, 2021 at 3:42

mhawke · Accepted Answer · 2021-01-07 16:54:27Z

1

For Postgres you've asked for 100 numbers between 1 and 50 - there will naturally be duplicates!

The MySQL code has a much wider range of possible values (100000) and only 1000 of them are sampled. Also the MySQL code generates random numbers until there is no key error, i.e. there are no duplicates in the column.

So, for Postgres, you could try checking for duplicates and retrying if found. Making the column unique will prevent duplicate insertion, but you have to handle it.

Also, a sample size that is larger than the number of values is required. Be careful with the retries, don't replicate the MySQL example. If the sample size is smaller than the required count, the loop will never terminate.

Update

Here is a function that will generate unique random numbers within a range and populate a table with them:

DROP FUNCTION IF EXISTS rand_cust (low INT, high INT, total INT);
CREATE OR REPLACE FUNCTION rand_cust (low INT, high INT, total INT) 
RETURNS TABLE (Cust_id INT) 
AS 
$$ 
BEGIN
------------------- Creating a customer table with Cust_id----------------------------
    DROP TABLE IF EXISTS Customer;
    CREATE TABLE IF NOT EXISTS Customer(Cust_id INT);

    RETURN query
    INSERT INTO Customer(Cust_id)
    SELECT *
    FROM generate_series(low, high)
    ORDER BY random() LIMIT total
    RETURNING -- returns the id's you generated
        Customer.Cust_id;

END $$ 
LANGUAGE plpgsql;

SELECT *
FROM rand_cust(1000, 2000, 100);  -- 100 unique numbers between 1000 and 2000 inclusive

Note that this will not be able to generate more numbers than the sample size, e.g. you can't generate 100 numbers between 1 and 50, only a maximum of 50. That's a consequence of the uniqueness requirement. The LIMIT clause will not cause errors, but you could add code to check that (hi - low) >= total before attempting the query.

If you'd prefer a simple function to generate n random unique numbers:

DROP FUNCTION IF EXISTS sample(low INT, high INT, total INT);
CREATE OR REPLACE FUNCTION sample(low INT, high INT, total INT) 
RETURNS TABLE (Cust_id INT) 
AS 
$$ 
BEGIN
    RETURN query
    SELECT *
    FROM generate_series(low, high)
    ORDER BY random() LIMIT total;  
END $$ 
LANGUAGE plpgsql;

-- create a table of unique random values
SELECT INTO Customer FROM sample(100, 200, 10);

edited Jan 7, 2021 at 16:54

answered Jan 7, 2021 at 14:49

mhawke

87.5k10 gold badges122 silver badges142 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Chloe Over a year ago

Yes the range will produce duplicates, I will take care of the range now. Thanks. I changed the range to 1-150 and 100 rows, still, it gives duplicate numbers, can you tell me how can I use On conflict with it?

mhawke Over a year ago

@Chloe: neither of the functions should produce duplicates - all my testing indicates otherwise.

mhawke Over a year ago

@Chloe: demo on sqlfiddle: sqlfiddle.com/#!17/b36f5a/8

Frank Heikens · Accepted Answer · 2021-01-07 15:01:15Z

0

As said before, you have a range between 1 and 50 and you want to create 100 records. That will never be unique. And your query doesn't ask for unique values anyway, so even with a million records you can have duplicates.

But, your code can be much simpler as well, without a loop and just a single query:

DROP FUNCTION IF EXISTS aa_dev.rand_cust ( low INT, high INT, total INT );
CREATE OR REPLACE FUNCTION aa_dev.rand_cust ( low INT, high INT, total INT ) 
RETURNS TABLE ( Cust_id INT ) 
AS 
$$ 
BEGIN
------------------- Creating a customer table with Cust_id----------------------------
    DROP TABLE IF EXISTS aa_dev.Customer;
    CREATE TABLE IF NOT EXISTS aa_dev.Customer ( Cust_id INT );
--------------------- No Loop to insert random -----------------------

    RETURN query
    INSERT INTO aa_dev.Customer ( Cust_id )
    SELECT FLOOR ( random( ) * ( high - low + 1 ) + low ) -- no uniqueness!
    FROM    generate_series(1, total) -- no loop needed
    RETURNING -- returns the id's you generated
        Customer.Cust_id;
    

END $$ 
LANGUAGE plpgsql;

SELECT
    * 
FROM
    aa_dev.rand_cust ( 1, 50, 100 );

answered Jan 7, 2021 at 15:01

Frank Heikens

129k26 gold badges157 silver badges153 bronze badges

3 Comments

mhawke Over a year ago

How would you handle duplicate prevention?

Frank Heikens Over a year ago

At least a constraint on the table. But why do you create id's like this? It just looks like some sequence, and that's standard within PostgreSQL.

Chloe Over a year ago

This is helpful as it has eliminated the loop, but the problem for non-unique values is still there even if I choose a wider range.

Collectives™ on Stack Overflow

How can I generate random numbers that are unique in column using postgresql

2 Answers 2

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related