4

I have a list of activities(A-start to H-end) for certain events. These can occur in any order, any number of times and can restart as well. I need to identify the blocks of activities within an event.

Eg: A B C D E F G H B C D H C D E F H E F G H

It starts only once (A) but ends multiple times

Need to number these activities to identify sets (How many times it ended)

output: 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 4 4 4 4 5

This helps me identify that the event ended (H) 5-1 = 4 times

5
  • What is the database your are working on (MSSQL, Oracle, MYSQL, ...)? Commented Mar 16, 2017 at 10:22
  • Input data you wrote as example are different records or is a single string column? Similarly, the output can be different records or a single string column? Commented Mar 16, 2017 at 10:24
  • You need to have a column that specifies the ordering. SQL tables represent unordered sets. There is no inherent ordering of the values. I also assume that the values are actually on separate rows. Commented Mar 16, 2017 at 10:24
  • Database : Vertica Commented Mar 16, 2017 at 10:31
  • The activities (A - H) are records and output needs to be corresponding to the records Commented Mar 16, 2017 at 10:32

2 Answers 2

2

If your events are series of events in time - try to play with the MATCH() clause and its dependent functions event_name(), pattern_id() and match_id() .

I just created a timeseries out of your input letters, spaced by one-hour intervals, and applied a MATCH() clause. If the PATTERN pat AS () clause uncannily reminds you of a grep expression, that's the way it works.

Just look at the query's output - and imagine how many interesting things you could do with the pattern_id-s and the match_id-s that you get- grouping by them, for example, in subsequent SELECT-s ...

WITH 
s(tm,event) AS (
          SELECT TIME '00:00:00','A'
UNION ALL SELECT TIME '01:00:00','B'
UNION ALL SELECT TIME '02:00:00','C'
UNION ALL SELECT TIME '03:00:00','D'
UNION ALL SELECT TIME '04:00:00','E'
UNION ALL SELECT TIME '05:00:00','F'
UNION ALL SELECT TIME '06:00:00','G'
UNION ALL SELECT TIME '07:00:00','H'
UNION ALL SELECT TIME '08:00:00','B'
UNION ALL SELECT TIME '09:00:00','C'
UNION ALL SELECT TIME '10:00:00','D'
UNION ALL SELECT TIME '11:00:00','H'
UNION ALL SELECT TIME '12:00:00','C'
UNION ALL SELECT TIME '13:00:00','D'
UNION ALL SELECT TIME '14:00:00','E'
UNION ALL SELECT TIME '15:00:00','F'
UNION ALL SELECT TIME '16:00:00','H'
UNION ALL SELECT TIME '17:00:00','E'
UNION ALL SELECT TIME '18:00:00','F'
UNION ALL SELECT TIME '19:00:00','G'
UNION ALL SELECT TIME '20:00:00','H'
)
SELECT
  *
, event_name()
, pattern_id()
, match_id()
FROM s
MATCH(
  PARTITION BY 1 -- nothing to partition by
  ORDER BY tm
  DEFINE  
    START_ev AS (event='A')
  , any_ev   AS (event NOT IN ('A','H'))
  , END_ev   AS (event='H')
  PATTERN pat AS (start_ev* any_ev+ end_ev)
  ROWS MATCH FIRST EVENT
);

tm      |event|event_name|pattern_id|match_id
00:00:00|A    |START_ev  |         1|       1
01:00:00|B    |any_ev    |         1|       2
02:00:00|C    |any_ev    |         1|       3
03:00:00|D    |any_ev    |         1|       4
04:00:00|E    |any_ev    |         1|       5
05:00:00|F    |any_ev    |         1|       6
06:00:00|G    |any_ev    |         1|       7
07:00:00|H    |END_ev    |         1|       8
08:00:00|B    |any_ev    |         2|       1
09:00:00|C    |any_ev    |         2|       2
10:00:00|D    |any_ev    |         2|       3
11:00:00|H    |END_ev    |         2|       4
12:00:00|C    |any_ev    |         3|       1
13:00:00|D    |any_ev    |         3|       2
14:00:00|E    |any_ev    |         3|       3
15:00:00|F    |any_ev    |         3|       4
16:00:00|H    |END_ev    |         3|       5
17:00:00|E    |any_ev    |         4|       1
18:00:00|F    |any_ev    |         4|       2
19:00:00|G    |any_ev    |         4|       3
20:00:00|H    |END_ev    |         4|       4
Sign up to request clarification or add additional context in comments.

3 Comments

I have to admit this is a better solution for Vertica, assuming the MATCH() clause is optimized appropriately.
Hi Gordon - I recently began to use the MATCH() clause in car sensor data to identify suspicious patterns in the measure curves of air mass, temperature and fuel throughput. The plan created is just like any plan including OLAP functions ( LEAD() , LAG(), SUM() OVER() and the like ), and is therefore absolutely within the parameters of an OLAP query.
. . That sounds like an interesting application.
2

It looks like you want to count the number of "H"s and "A"s before a given value. This requires having a column that specifies the ordering. Let me assume this column is called id.

Then, you can do this with window functions:

select t.*,
       sum(case when col = 'H' then 1 else 0 end) over (partition by grp order by id) + 1 as output
from (select t.*,
             sum(case when col = 'A' then 1 else 0 end) over (order by id) as grp
      from t
     ) t;

The subquery defines the "activity" groups, by doing a cumulative sum of "A"s. The outer query then defines the "event" groups by doing a cumulative sum of "E"s.

To be honest, I cannot tell if the "H" is part of the preceding value or the next value. If the next value then the query can use a window clause or a slight tweak to the logic:

       (sum(case when col = 'H' then 1 else 0 end) over (partition by grp order by id) +
        (case when col = 'H' then 0 else 1 end)
       ) as output

1 Comment

This helped. Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.