I was trying to make use of the MERGE command for populating dimensions in Snowflake. To implement surrogate keys, I created a column defaulted to a sequence number that auto increments whenever a new row gets inserted. I tried a similar approach in other data warehousing platforms and it never caused any issues. However, I noticed that whenever I use the MERGE command in Snowflake, the sequence number increments for every single row processed by the MERGE command, regardless of whether it results in an UPDATE or INSERT operation.
The following is a simple example of what I'm referring to:
-- Sequence
CREATE OR REPLACE SEQUENCE seq1 START=1 INCREMENT=1;
-- Source table
CREATE OR REPLACE TABLE source_table
(
row_key int,
row_value string
);
-- Target table: Column ID uses the sequence
CREATE OR REPLACE TABLE target_table
(
id int DEFAULT seq1.nextval,
row_key int,
row_value string
);
-- Initial data
INSERT INTO source_table VALUES
(1,'One'),
(2,'Two'),
(3,'Three');
MERGE INTO target_table D
USING source_table s
ON D.row_key=s.row_key
WHEN MATCHED AND D.row_value!=s.row_value THEN UPDATE SET row_value=s.row_value
WHEN NOT MATCHED THEN INSERT(row_key,row_value) VALUES (s.row_key,s.row_value);
After running these commands, the output table would contain these rows:
ID,ROW_KEY,ROW_VALUE
1,1,One
2,2,Two
3,3,Three
Now, let's insert a new row and run the same merge command again:
INSERT INTO source_table VALUES
(4,'Four');
MERGE INTO target_table D
USING source_table s
ON D.row_key=s.row_key
WHEN MATCHED AND D.row_value!=s.row_value THEN UPDATE SET row_value=s.row_value
WHEN NOT MATCHED THEN INSERT(row_key,row_value) VALUES (s.row_key,s.row_value);
This time, the output of the table looks like this:
ID,ROW_KEY,ROW_VALUE
1,1,One
2,2,Two
3,3,Three
7,4,Four
If I insert another row, the next MERGE command will insert the new row with its ID set to 12 and the same goes on and on. It looks as if the MERGE command increments the sequence number for each row it reads from the source table, even if they don't end up being inserted into the target table at all.
Is this intentional behaviour? I tried the IDENTITY functionality instead of the sequence and it didn't change the output.
The workaround I came with was to replace the MERGE command with multiple UPDATE and INSERT statements instead, but I'm still keen to know the reason behind this behaviour.