0

Thanks for the help! I'm using MS SQL Server 17 and attempting to group by an ID and find common pairings in a second column based on the shared ID. Most other questions involved finding any combination between multiple columns.

Here is some sample data:

/* Create sample data */ 
 DROP TABLE IF EXISTS example
 CREATE TABLE example (
    PersonID int,
    Place varchar(50)
 )

 INSERT INTO example (PersonID, Place)
 VALUES (1, 'home'), (2, 'work'), (3, 'gym'), (1, 'grocery'), (1, 'home'), (2, 'gym'), (3, 'work'), 
        (4, 'school'), (2, 'gym'), (3, 'gym'), (4, 'home'), (4, 'school'), (4, 'work'), (5, 'bar')

 SELECT * FROM example
 Order by PersonID asc

Whenever a PersonID has more than one row, I'd like to see the common pairings of Place in the following format (for a Sankey chart).

from      | to       | count
____________________________
gym       | gym      | 2
gym       | work     | 2
school    | school   | 1
home      | home     | 1
school    | work     | 1
grocery   | home     | 1 

A pairing can be for the same place, e.g. PersonID == 1 went 'home' twice, but I only need pairings of two in the from-to format.

So far I've tried the STRING_AGG function, but I'm having difficulty limiting it to just two-way pairings. Really appreciate the help and I apologize if this is a simple answer that's been solved before.

Attempt:

/* Next, let's try to make our Sankey data (from, to, count) */
DROP TABLE IF EXISTS temp_example
SELECT t.combination, COUNT(*) AS value
INTO temp_example
FROM (SELECT STRING_AGG(Place, ',') within group (order by Place) combination 
           FROM example
           GROUP BY PersonID
           HAVING COUNT(*) >= 2
     ) t
GROUP BY t.combination
ORDER BY value desc
3
  • I can't seem to match your sample data and desired result. Are they really supposed to match? Commented Aug 29, 2020 at 17:42
  • Apologies - I will modify the desired output to match the synthetic data. Commented Aug 29, 2020 at 17:48
  • SQL tables have no implicit order. If you want to enforce an order (as implied by 'from and 'to') you need to include at least one other column which can be used to derive that ordering. Commented Aug 29, 2020 at 18:10

1 Answer 1

1

First, you need another column. One which can be used to identify which order the person visited the places. SQL tables are unordered, so the order you insert the data isn't enough. For example, add a timestamp column or something?

Then, use LAG() to find out the place that was previously visited for each row. After which it's a simple GROUP BY.

WITH
  lagged AS
(
  SELECT
    *, 
    LAG(place) OVER (PARTITION BY PersonID ORDER BY aTimestampOrSomething) AS prevPlace
  FROM
    example
)
SELECT
  prevPlace,
  place,
  COUNT(*)
FROM
  lagged

(Apologies for typos, etc, I'm on my phone)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! Apologies that I can't up-vote since I just joined.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.