0

I am an Oracle SQL beginner and hope someone here can help me with this.

I have a pretty basic Select query which works so far. This is selecting click data, i.e. how often a user clicks on a certain class of buttons on a website.

My problem is that I need to exclude duplicate or multiple clicks from the same user but only those that are duplicates.

Each user has a unique ID (CUSTOMER_ID) and I can get either the HIT_DAY or the HIT_DATETIME to identify the time of the click. However, there are scenarios where a user needs to click multiple buttons of the same class the same day.

Is there something along the lines of "get me only records where the user (CUSTOMER_ID) is different OR where the click time (HIT_DATETIME) is unique for +/- 10 seconds ? Or anything else I could do here ? I don't have other unique identifiers.

My query:

SELECT
    owh.HIT_DAY
    , owh.HIT_DATETIME
    , COUNT(owh.CUSTOMER_ID) AS COUNT_CUSTOMER_ID
    , owh.IS_P_CUSTOMER AS P_CUSTOMER
    , owh.SHORTENED_URL AS URL_SHORTENED
    , owh.MP
FROM
    O_WIN_HITS owh
WHERE 1=1
    AND ...
    AND ...
    AND ...
    AND ...
GROUP BY
    owh.HIT_DAY
    , owh.HIT_DATETIME
    , owh.IS_P_CUSTOMER
    , owh.SHORTENED_URL
    , owh.MP
ORDER BY
    owh.HIT_DAY DESC

Note: This is part of a larger query.

Many thanks in advance for any help, Mike

2
  • 1
    it is unclear from the explanation. can you show some sample data and the expected output? Commented May 18, 2017 at 11:52
  • So when a user clicks once, then 9 secs later, then 9 secs later, then 9 secs later, I count the four as one click. When a user clicks once, then 9 secs later, then 11 secs later, then 9 secs later, I count the four as two clicks. Correct? Commented May 18, 2017 at 11:57

1 Answer 1

1

I think you want something like this:

select owh.*
from (select owh.*,
             lag(hit_datetime) over (partition by customer_id order by hit_datetime) as prev_hdt
      from O_WIN_HITS owh
     ) owh
where prev_hdt is null or
      hit_datetime > prev_hdt + 10 / (24 * 60 * 60);

This returns rows that are either the first row for the customer or appear for a customer more than 10 seconds after the previous hit.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much, Gordon - this is exactly what I was looking for here. Huge help ! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.