How to use GROUP_CONCAT where only the sequential duplicates get dropped

Question

My trafficlog table looks like this:

intLogID	strSessionID	strPage
1	e3a8240b39	./
2	e3a8240b39	./about
3	e3a8240b39	./contact
4	5accab7da9	./
5	e3a8240b39	./contact
6	e3a8240b39	./about
7	71ee2ea4fe	./
8	287a7adb59	./
9	287a7adb59	./about
10	287a7adb59	./contact
11	287a7adb59	./about

And my MySQL Query looks like this:

SELECT COUNT(A.strUserPath) AS intUserPathCount, A.strUserPath FROM (
  SELECT SUBSTRING_INDEX (
    GROUP_CONCAT (
      strPage ORDER BY intLogID ASC SEPARATOR '>>'
    ), '>>' , 10
  ) AS strUserPath 
  FROM trafficlog 
  GROUP BY strSessionID 
) A 
GROUP BY A.strUserPath 
ORDER BY intUserPathCount DESC

This yields an output that looks like:

intUserPathCount	strUserPath
1	./>>./about>>./contact>>./contact>>./about
2	./
1	./>>./about>>./contact>>./about

What I want the query to do is exclude the duplicate sequential "./contact" entries, but keep the non-sequential "./about" entries that have other stuff between them. I know how to use DISTINCT to exclude all duplicates, but I only want to exclude the sequential duplicates.

My issue is further complicated in that I need them to be sequential within the GROUP_CONCAT as it is possible that they will be non-sequential by thier key IDs.

The output should look like this:

intUserPathCount	strUserPath
2	./>>./about>>./contact>>./about
2	./

You should be able to accomplish this using the LEAD() or LAG() window functions to compare sequential values. — Barmar
– Barmar, Commented Jul 16, 2024 at 17:12
@Barmar Thanks, that put me on the right track to figuring it out. — Nosajimiki
– Nosajimiki, Commented Jul 16, 2024 at 21:22

Nosajimiki · Accepted Answer · 2024-07-16 21:20:02Z

Barmar's Comment about LEAD() or LAG() put me on the right track to solve this, but I ran into a lot of caveats when using these; so, I figure I'd put it all out there for anyone else who encounters a similar problem.

First of all, these functions can not be used inside of a GROUP_CONCAT() or WHERE statement directly. So, the first consideration is that you have to start off with a subquery using LEAD()/LAG() to filter out the duplicates, and then you run the GROUP_CONCAT() on the results of said subquery.

Secondly is the knowing to use PARTITION BY. It's not included as an option in several of guides I found at first on LEAD()/LAG(), but it is important for solving the problem of groups of values that are non-sequential by thier key IDs. In the example data in my question where session ID 5accab7da9 was generated in the middle of session e3a8240b39, LEAD()/LAG() would not see rows 3 and 5 as sequential without a PARTITION BY statement.

So, the query now looks like this, and generates the the GROUP_CONCAT I needed without the sequential duplicates:

SELECT COUNT(A.strUserPath) AS intUserPathCount, A.strUserPath FROM (
    SELECT SUBSTRING_INDEX(
        GROUP_CONCAT(
            B.strPage ORDER BY B.intLogID ASC SEPARATOR '>>'
        ), '>>' , " . ($PathLimit + 1) . "
    ) AS strUserPath 
    FROM (
        SELECT 
            IF(LEAD (strPage) OVER (PARTITION BY strSessionID ORDER BY intLogID ASC) = strPage, NULL, strPage) AS strPage, 
            intLogID, 
            strSessionID, 
            dtCheckDate 
        FROM {$wpdb->prefix}eseo_trafficlog 
        WHERE strPage IS NOT NULL
    ) B
    GROUP BY B.strSessionID 
) A 
GROUP BY A.strUserPath 
ORDER BY intUserPathCount DESC

Collectives™ on Stack Overflow

How to use GROUP_CONCAT where only the sequential duplicates get dropped

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related