0

In a PostgreSQL text array I want to remove consecutive identical values. A distinct isn't enough because I can have duplicate values but not consecutive, and I want to keep them. The order of values have importance.

For example

SELECT ARRAY['A', 'B', 'C', 'C', 'D', 'A'];

Should return {A,B,C,D,A}

-- Edit

AS @MkSpring Mk said, this post suggest an answer. I try to adapt it:

WITH q_array AS (
    SELECT ARRAY['A', 'B', 'C', 'C', 'D', 'A'] AS full_array
    ), q_unnest AS (
        SELECT  
        unnest(full_array) AS unnest_array
        FROM q_array
        ), q_id AS (
            SELECT
            row_number() OVER () AS id,
            unnest_array
            FROM q_unnest
        )
SELECT
array_agg(q_id.unnest_array) AS array_logical
FROM (SELECT q_id.*, lag(q_id.unnest_array) OVER (ORDER BY q_id.id) AS unnest_array_logical FROM q_id) q_id
WHERE unnest_array_logical IS DISTINCT FROM q_id.unnest_array

I find this syntax very verbose, maybe my approach is not efficient enough.

Is this syntax ok ? Is it a best practice to create a function or can I write it directly in a query ?

1
  • Partly because in my configuration I don't have a native ID column. And I'm also asking if it's good practice to write it in the query or in a dedicated function. Commented Dec 9, 2021 at 14:45

1 Answer 1

0

To improve query performance, I made this function:

CREATE OR REPLACE FUNCTION array_remove_consecutive_duplicates(
    array_vals anyarray)
    RETURNS TABLE(array_logical anyarray) 
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
    ROWS 1000

AS $BODY$
    BEGIN
      RETURN QUERY 
      WITH q_array AS (
    SELECT array_vals AS full_array
    ), q_unnest AS (
        SELECT  
        unnest(full_array) AS unnest_array
        FROM q_array
        ), q_id AS (
            SELECT
            row_number() OVER () AS id,
            unnest_array
            FROM q_unnest
        )
SELECT
array_agg(q_id.unnest_array) AS array_logical
FROM (SELECT q_id.*, lag(q_id.unnest_array) OVER (ORDER BY q_id.id) AS unnest_array_logical FROM q_id) q_id
WHERE unnest_array_logical IS DISTINCT FROM q_id.unnest_array;
    END;
  
$BODY$;

My query will now write like this:

SELECT array_remove_consecutive_duplicates(ARRAY['A', 'B', 'C', 'C', 'D', 'A']);

It's a bit faster and easier to use in a query than my fist approach. Maybe it's not the good way or my syntax isn't efficient. If someone have a better way to suggest, I'll take it.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.