1

I have a table with pk and dept columns:

pk dept
-------
27  A
29  A
30  B
31  B
33  A

I need to select the first consecutive group, that is the first successive set of rows all having the same dept value when the table is ordered by pk, i.e. the expected result is:

pk dept
-------
27  A
29  A

In my example there are 3 consecutive groups (AA, BB and A). The size of a group is unlimited (can be more than 2).

1
  • 1
    You've made this new term "Consequential Group" (of which you want the first) and even put it in bold, but I don't see a definition. Is a "Consequential Group" a set of two records, or a set of at least two records before a change in PK, or something else entirely? Commented Jan 9, 2017 at 17:05

4 Answers 4

4

The following query should do what you want (I named your table tx):

SELECT *
FROM tx t1
WHERE NOT EXISTS (
  SELECT *
  FROM tx t2
  WHERE t2.dept <> t1.dept
    AND t2.pk < t1.pk);

The idea is to look for tuples such that no tuple with a lesser pk and a different department exists.

  • The first two A tuples are kept;
  • The B tuples are dropped because of the first two A tuples;
  • The last A tuple is dropped because of the B tuples.
Sign up to request clarification or add additional context in comments.

Comments

0

Remember about stored functions. Unlike to using window functions its allows to avoid the reading of the whole table:

--drop function if exists foo();
--drop table if exists t;
create table t(pk int, dep text);
insert into t values(27,'A'),(29,'A'),(30,'B'),(31,'B'),(33,'A');

create function foo() returns setof t language plpgsql as $$
declare
  r t;
  p t;
begin
  for r in (select * from t order by pk) loop
    if p is null then
      p := r;
    end if;
    exit when p.dep is distinct from r.dep;
    return next r;
  end loop;
  return;
end $$;

select * from foo();

Comments

0

Its a little bit complex and probably, the permformance poor, but you can achieve what you want with the code below. There are four operations:

  1. The first one is where we obtain the base order and base group ids for the next operation.
  2. In the sencond operation we make the trick computing an unique group id for each group
  3. In the third operation, where are spreading the unique group id over the rows of each group.
  4. Finally, we compute a consecutive group id for each group to allow the discretionary selection of groups, so we only have to filter by the group number we want to obtain.

Hope this helps.

SELECT fourthOperation.pk,
       fourthOperation.dept 
 FROM (SELECT thirdOperation.pk,
              thirdOperation.dept,
              DENSE_RANK() OVER (ORDER BY thirdOperation.spreadedIdGroup) denseIdGroup
         FROM (SELECT secondOperation.*, 
                      NVL(idGroup, LAG(secondOperation.idGroup IGNORE NULLS) OVER (ORDER BY secondOperation.numRow)) spreadedIdGroup
              FROM (SELECT firstOperation.*,
                           CASE WHEN LAG(firstOperation.rankRow) OVER (ORDER BY firstOperation.numRow) = firstOperation.rankRow
                                THEN NULL
                                ELSE firstOperation.numRow
                                 END idGroup
                       FROM (SELECT yourTable.*, 
                                    ROW_NUMBER() OVER (ORDER BY PK)   AS numRow, 
                                    DENSE_RANK() OVER (ORDER BY DEPT) AS rankRow
                               FROM ABORRAR yourTable) firstOperation) secondOperation ) thirdOperation) fourthOperation
 WHERE fourthOperation.denseIdGroup = 1                                   

6 Comments

The first SELECT clause should be SELECT pk, dept instead of SELECT * ;-) Other than that I agree on the "a little bit complex" part :D Nice SQL skills though!
I just found your solution wrong as it assumes that the table is clustered on the pk column. If I add a tuple (1, 'A'), only this tuple will be retrieved by your solution while op wants 1, 27 and 29.
Thanks for your comment but the OP sais "when the table is ordered by pk" and wants only the first consecutive group so my solution is correct. Moreover, he wants the entire row, not only the pk.
Well I'm not sure he considers that the table is ordered on the disk (Without the last WHERE clause I get (1, 'A', 1), (27, 'A', 2), (29, 'A', 2), (30, 'B', 3), (31, 'B', 3), (33, 'A', 4) while I would expected 1, 27 and 29 to be in the same denseidgroup) ... And what I meant was that you retrieve some denseIdGroup column in addition to the pk and dept columns
Thanks Fabian for your correction, i've made some changes to get the correct resultset. Basically, i've include the IGNORE NULLS clause in the third operacion LAG function use and replace the CASE eval by an NVL. Additionally, in the second operation i've change the behaviour to to assign a group ID only on those records where occurs a DEPT field value change based on the PK order
|
-2

I'm not sure if I understand your question, but for the first pk of each dept you can try this:

select min(pk) as pk,
       dept
from your_table
group by dept

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.