PostgreSQL - How to select the first consecutive group having same value

Question

I have a table with pk and dept columns:

pk dept
-------
27  A
29  A
30  B
31  B
33  A

I need to select the first consecutive group, that is the first successive set of rows all having the same dept value when the table is ordered by pk, i.e. the expected result is:

pk dept
-------
27  A
29  A

In my example there are 3 consecutive groups (AA, BB and A). The size of a group is unlimited (can be more than 2).

You've made this new term "Consequential Group" (of which you want the first) and even put it in bold, but I don't see a definition. Is a "Consequential Group" a set of two records, or a set of at least two records before a change in PK, or something else entirely? — JNevill
– JNevill, Commented Jan 9, 2017 at 17:05

Fabian Pijcke · Accepted Answer · 2017-01-10 07:36:23Z

4

The following query should do what you want (I named your table tx):

SELECT *
FROM tx t1
WHERE NOT EXISTS (
  SELECT *
  FROM tx t2
  WHERE t2.dept <> t1.dept
    AND t2.pk < t1.pk);

The idea is to look for tuples such that no tuple with a lesser pk and a different department exists.

The first two A tuples are kept;
The B tuples are dropped because of the first two A tuples;
The last A tuple is dropped because of the B tuples.

answered Jan 10, 2017 at 7:36

Fabian Pijcke

3,31027 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Abelisto · Accepted Answer · 2017-01-09 17:25:32Z

0

Remember about stored functions. Unlike to using window functions its allows to avoid the reading of the whole table:

--drop function if exists foo();
--drop table if exists t;
create table t(pk int, dep text);
insert into t values(27,'A'),(29,'A'),(30,'B'),(31,'B'),(33,'A');

create function foo() returns setof t language plpgsql as $$
declare
  r t;
  p t;
begin
  for r in (select * from t order by pk) loop
    if p is null then
      p := r;
    end if;
    exit when p.dep is distinct from r.dep;
    return next r;
  end loop;
  return;
end $$;

select * from foo();

answered Jan 9, 2017 at 17:25

Abelisto

15.8k3 gold badges38 silver badges47 bronze badges

Comments

cmoron · Accepted Answer · 2017-01-10 10:36:03Z

0

Its a little bit complex and probably, the permformance poor, but you can achieve what you want with the code below. There are four operations:

The first one is where we obtain the base order and base group ids for the next operation.
In the sencond operation we make the trick computing an unique group id for each group
In the third operation, where are spreading the unique group id over the rows of each group.
Finally, we compute a consecutive group id for each group to allow the discretionary selection of groups, so we only have to filter by the group number we want to obtain.

Hope this helps.

SELECT fourthOperation.pk,
       fourthOperation.dept 
 FROM (SELECT thirdOperation.pk,
              thirdOperation.dept,
              DENSE_RANK() OVER (ORDER BY thirdOperation.spreadedIdGroup) denseIdGroup
         FROM (SELECT secondOperation.*, 
                      NVL(idGroup, LAG(secondOperation.idGroup IGNORE NULLS) OVER (ORDER BY secondOperation.numRow)) spreadedIdGroup
              FROM (SELECT firstOperation.*,
                           CASE WHEN LAG(firstOperation.rankRow) OVER (ORDER BY firstOperation.numRow) = firstOperation.rankRow
                                THEN NULL
                                ELSE firstOperation.numRow
                                 END idGroup
                       FROM (SELECT yourTable.*, 
                                    ROW_NUMBER() OVER (ORDER BY PK)   AS numRow, 
                                    DENSE_RANK() OVER (ORDER BY DEPT) AS rankRow
                               FROM ABORRAR yourTable) firstOperation) secondOperation ) thirdOperation) fourthOperation
 WHERE fourthOperation.denseIdGroup = 1

edited Jan 10, 2017 at 10:36

answered Jan 10, 2017 at 7:28

cmoron

342 bronze badges

6 Comments

Fabian Pijcke Over a year ago

The first SELECT clause should be SELECT pk, dept instead of SELECT * ;-) Other than that I agree on the "a little bit complex" part :D Nice SQL skills though!

Fabian Pijcke Over a year ago

I just found your solution wrong as it assumes that the table is clustered on the pk column. If I add a tuple (1, 'A'), only this tuple will be retrieved by your solution while op wants 1, 27 and 29.

cmoron Over a year ago

Thanks for your comment but the OP sais "when the table is ordered by pk" and wants only the first consecutive group so my solution is correct. Moreover, he wants the entire row, not only the pk.

Fabian Pijcke Over a year ago

Well I'm not sure he considers that the table is ordered on the disk (Without the last WHERE clause I get (1, 'A', 1), (27, 'A', 2), (29, 'A', 2), (30, 'B', 3), (31, 'B', 3), (33, 'A', 4) while I would expected 1, 27 and 29 to be in the same denseidgroup) ... And what I meant was that you retrieve some denseIdGroup column in addition to the pk and dept columns

cmoron Over a year ago

Thanks Fabian for your correction, i've made some changes to get the correct resultset. Basically, i've include the IGNORE NULLS clause in the third operacion LAG function use and replace the CASE eval by an NVL. Additionally, in the second operation i've change the behaviour to to assign a group ID only on those records where occurs a DEPT field value change based on the PK order

|

wind39 · Accepted Answer · 2017-01-09 17:20:52Z

-2

I'm not sure if I understand your question, but for the first pk of each dept you can try this:

select min(pk) as pk,
       dept
from your_table
group by dept

answered Jan 9, 2017 at 17:20

wind39

4514 silver badges15 bronze badges

Collectives™ on Stack Overflow

PostgreSQL - How to select the first consecutive group having same value

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related