My trafficlog table looks like this:
| intLogID | strSessionID | strPage |
|---|---|---|
| 1 | e3a8240b39 | ./ |
| 2 | e3a8240b39 | ./about |
| 3 | e3a8240b39 | ./contact |
| 4 | 5accab7da9 | ./ |
| 5 | e3a8240b39 | ./contact |
| 6 | e3a8240b39 | ./about |
| 7 | 71ee2ea4fe | ./ |
| 8 | 287a7adb59 | ./ |
| 9 | 287a7adb59 | ./about |
| 10 | 287a7adb59 | ./contact |
| 11 | 287a7adb59 | ./about |
And my MySQL Query looks like this:
SELECT COUNT(A.strUserPath) AS intUserPathCount, A.strUserPath FROM (
SELECT SUBSTRING_INDEX (
GROUP_CONCAT (
strPage ORDER BY intLogID ASC SEPARATOR '>>'
), '>>' , 10
) AS strUserPath
FROM trafficlog
GROUP BY strSessionID
) A
GROUP BY A.strUserPath
ORDER BY intUserPathCount DESC
This yields an output that looks like:
| intUserPathCount | strUserPath |
|---|---|
| 1 | ./>>./about>>./contact>>./contact>>./about |
| 2 | ./ |
| 1 | ./>>./about>>./contact>>./about |
What I want the query to do is exclude the duplicate sequential "./contact" entries, but keep the non-sequential "./about" entries that have other stuff between them. I know how to use DISTINCT to exclude all duplicates, but I only want to exclude the sequential duplicates.
My issue is further complicated in that I need them to be sequential within the GROUP_CONCAT as it is possible that they will be non-sequential by thier key IDs.
The output should look like this:
| intUserPathCount | strUserPath |
|---|---|
| 2 | ./>>./about>>./contact>>./about |
| 2 | ./ |
LEAD()orLAG()window functions to compare sequential values.