-2

I would like to get all missing numbers in a sequence of numbers.
Just wondering if there is a better approach than below?

SELECT x
  FROM 
  (
    SELECT x,
           LAG(x,1) OVER ( ORDER BY x ) prev_x
      FROM 
       ( SELECT * FROM
        ( SELECT 1 AS x ),
        ( SELECT 2 AS x ),
        ( SELECT 3 AS x ),
        ( SELECT 4 AS x ),
        ( SELECT 5 AS x ),
        ( SELECT 6 AS x ),
        ( SELECT 8 AS x ),
        ( SELECT 10 AS x ),
        ( SELECT 11 AS x )
       )
  ) 
 WHERE x-prev_x > 1;
3
  • it really depends on which flavor of sql you are in with this! based on tags you added - i would assume this is for BigQuery? please confirm or clarify! Commented Sep 11, 2016 at 4:09
  • any solution is appreciated... Commented Sep 11, 2016 at 4:15
  • SO prefers specific questions that have a particular answer. Open questions (too many possible answers) are discouraged. Commented Sep 11, 2016 at 15:08

3 Answers 3

3

Let me be honest with you!
Any another working solution will be better then presented in question - for one simple reason - it is wrong! it does not return missing numbers at all! it rather shows number after next gap. that's all (hope you will appreciate that I opened your eyes on this)

Now, about better solution(s) - there are tons of options for you to pursue.
Note: below options for BigQuery ONLY!

Option 1

BigQuery Standard SQL - see How to Enable Standard SQL

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
),
nums AS (
  SELECT num 
  FROM UNNEST(GENERATE_ARRAY((SELECT MIN(x) FROM YourTable), (SELECT MAX(x) FROM YourTable))) AS num
)
SELECT num FROM nums
LEFT JOIN YourTable ON num = x
WHERE x IS NULL
ORDER BY num

Option 2

BigQuery Legacy SQL you can try below (here you need to set start/min and end/max values inside select expression for nums table

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 11, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 11
) AS nums
LEFT JOIN (
  SELECT x FROM
    (SELECT 1 AS x),
    (SELECT 2 AS x),
    (SELECT 3 AS x),
    (SELECT 6 AS x),
    (SELECT 8 AS x),
    (SELECT 10 AS x),
    (SELECT 11 AS x)
) AS YourTable
ON num = x
WHERE x IS NULL

Option 3

BigQuery Legacy SQL - if you want not to have dependency on min and max and have needs in setting those values - you can use below solution - it only requires to set high enough max to accommodate your expected growth (I put 1000 for example)

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 1000, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 1000
) AS nums
LEFT JOIN YourTable
ON num = x
WHERE x IS NULL
AND num BETWEEN (SELECT MIN(x) FROM YourTable) AND (SELECT MAX(x) FROM YourTable) 

Option 4 (for some reason - my favorite so far)

BigQuery Standard SQL - without explicit joins

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
)
SELECT num
FROM (SELECT x, LEAD(x) OVER(ORDER BY x) AS next_x FROM YourTable),  
     UNNEST(GENERATE_ARRAY(x + 1,next_x - 1)) AS num
WHERE next_x - x > 1
ORDER BY x
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks for the solution. But what if the table(x) keeps increasing...?
use first solution in my answer (standard sql) - it takes care of this!
added for bq legacy sql so it takes care of it too
In the first solution I am trying to implement using postgresql but those functions do not work... Tried using sqlfiddle link as below. sqlfiddle.com/#!15/19495/4
Option 2 works fine and its limited ... In option 1 cte's do not work on Bigquery.. Just ran it now...
|
1

The shortest solution in Postgres is with standard SQL EXCEPT:

WITH tbl(x) AS (SELECT unnest ('{1,2,3,4,5,6,8,10,11}'::int[]))
-- the CTE provides a temp table - might be an actual table instead
SELECT generate_series(min(x), max(x)) FROM tbl
EXCEPT ALL
TABLE  tbl;

The set-returning function unnest() is Postgres specific and just the shortest syntax to provide your set of numbers as table.

Also works with duplicates or NULL values in the data.

TABLE tbl is (standard SQL!) short syntax for SELECT * FROM tbl:

Related (with more explanation):

1 Comment

I am fine with this solution for postgres. Is there any way to implement this without using the functions..?
1

Your query can be written much more succinctly like so:

SELECT x
FROM (
    SELECT x,
           lag(x, 1) OVER ( ORDER BY x ) prev_x
    FROM ( VALUES (1), (2), (3), (4), (5), (6), (8), (10), (11) ) v(x)
) sub
WHERE x-prev_x > 1;

This will return the next highest value after a miss (8, 10), not the missing values themselves (7, 9). But of course you do not have the values handy.

If you know the range of values in the sequence, then you can use this:

SELECT s.x
FROM generate_series(<<min>>, <<max>>) s(x)
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

This returns the actual missing values.

If you do not know the range of values, you need to add a sub-query:

SELECT s.x
FROM ( SELECT min(x), max(x) FROM my_table ) r
JOIN generate_series(r.min, r.max) s(x) ON true
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

Alternatively, instead of a LEFT JOIN:

SELECT x
FROM ( SELECT min(x), max(x) FROM my_table ) r,
     generate_series(r.min, r.max) s(x)
WHERE NOT EXISTS (SELECT 1 FROM my_table t WHERE t.x = s.x);

2 Comments

This is a solid answer. I don't see a reason for downvotes. Minor confusing detail: the OP's example omits 7 and 9 (not 8 and 9).
@ErwinBrandstetter A down-vote is only a click away (but I did mess up the parentheses in the VALUES clause in my initial answer). Thanks for the thumbs-up. List of values corrected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.