0

My trafficlog table looks like this:

intLogID strSessionID strPage
1 e3a8240b39 ./
2 e3a8240b39 ./about
3 e3a8240b39 ./contact
4 5accab7da9 ./
5 e3a8240b39 ./contact
6 e3a8240b39 ./about
7 71ee2ea4fe ./
8 287a7adb59 ./
9 287a7adb59 ./about
10 287a7adb59 ./contact
11 287a7adb59 ./about

And my MySQL Query looks like this:

SELECT COUNT(A.strUserPath) AS intUserPathCount, A.strUserPath FROM (
  SELECT SUBSTRING_INDEX (
    GROUP_CONCAT (
      strPage ORDER BY intLogID ASC SEPARATOR '>>'
    ), '>>' , 10
  ) AS strUserPath 
  FROM trafficlog 
  GROUP BY strSessionID 
) A 
GROUP BY A.strUserPath 
ORDER BY intUserPathCount DESC

This yields an output that looks like:

intUserPathCount strUserPath
1 ./>>./about>>./contact>>./contact>>./about
2 ./
1 ./>>./about>>./contact>>./about

What I want the query to do is exclude the duplicate sequential "./contact" entries, but keep the non-sequential "./about" entries that have other stuff between them. I know how to use DISTINCT to exclude all duplicates, but I only want to exclude the sequential duplicates.

My issue is further complicated in that I need them to be sequential within the GROUP_CONCAT as it is possible that they will be non-sequential by thier key IDs.

The output should look like this:

intUserPathCount strUserPath
2 ./>>./about>>./contact>>./about
2 ./
2
  • 1
    You should be able to accomplish this using the LEAD() or LAG() window functions to compare sequential values. Commented Jul 16, 2024 at 17:12
  • @Barmar Thanks, that put me on the right track to figuring it out. Commented Jul 16, 2024 at 21:22

1 Answer 1

0

Barmar's Comment about LEAD() or LAG() put me on the right track to solve this, but I ran into a lot of caveats when using these; so, I figure I'd put it all out there for anyone else who encounters a similar problem.

First of all, these functions can not be used inside of a GROUP_CONCAT() or WHERE statement directly. So, the first consideration is that you have to start off with a subquery using LEAD()/LAG() to filter out the duplicates, and then you run the GROUP_CONCAT() on the results of said subquery.

Secondly is the knowing to use PARTITION BY. It's not included as an option in several of guides I found at first on LEAD()/LAG(), but it is important for solving the problem of groups of values that are non-sequential by thier key IDs. In the example data in my question where session ID 5accab7da9 was generated in the middle of session e3a8240b39, LEAD()/LAG() would not see rows 3 and 5 as sequential without a PARTITION BY statement.

So, the query now looks like this, and generates the the GROUP_CONCAT I needed without the sequential duplicates:

SELECT COUNT(A.strUserPath) AS intUserPathCount, A.strUserPath FROM (
    SELECT SUBSTRING_INDEX(
        GROUP_CONCAT(
            B.strPage ORDER BY B.intLogID ASC SEPARATOR '>>'
        ), '>>' , " . ($PathLimit + 1) . "
    ) AS strUserPath 
    FROM (
        SELECT 
            IF(LEAD (strPage) OVER (PARTITION BY strSessionID ORDER BY intLogID ASC) = strPage, NULL, strPage) AS strPage, 
            intLogID, 
            strSessionID, 
            dtCheckDate 
        FROM {$wpdb->prefix}eseo_trafficlog 
        WHERE strPage IS NOT NULL
    ) B
    GROUP BY B.strSessionID 
) A 
GROUP BY A.strUserPath 
ORDER BY intUserPathCount DESC
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.