Aggregating SQL Data and concatenating strings / combining records

Question

I am trying to aggregate data from one table into another. I've inherited this project; I did not design this database nor will I be able to change its format.

The [RawData] table will have 1 record per account, per ChannelCodeID. This table (where I currently have data) has the following fields:

[Account] int
[ChannelCodeID] int
[ChannelCode] varchar(10)

The [AggregatedData] table will have 1 record per account. This table (into which I need to insert data) has the following fields:

[Account] int
[Count] int
[Channel1] int
[Channel2] int
[Channel3] int
[Names] varchar(250)

For example, I might have the following records in my [RawData] table:

Account          ChannelCodeID     ChannelCode
12345            2                 ABC
12345            4                 DEF
12345            6                 GHI
54321            2                 ABC
54321            6                 GHI
99999            2                 ABC

And, after aggregating them, I would need to produce the following records in my [AggregatedData] table:

Account     Count    Chanel1   Channel2   Channel3    Names
12345       3        2         4          6           ABC.DEF.GHI
54321       2        2         6          0           ABC.GHI    
99999       1        2         0          0           ABC

As you can see, the count is how many records exist in my [RawData] table, Channel1 is the first ChannelCodeID, Channel2 is the second, and Channel3 is the third. If there are not enough ChannelCodeIDs from my [RawData] table, extra Channel columns get a '0' value. Furthermore, I need to concatenate the 'ChannelCode' column and store it in the 'Names' column of the [AggregatedData] table, but (obviously) if there is only one record, I don't want to add the '.'

I can't figure out how to do this without using a cursor and a bunch of variables - but I'm guessing there HAS to be a better way. This doesn't have to be super-fast since it will only run once a month, but it will have to process at least 10-15,000 records each time.

Thanks in advance...

EDIT:

ChannelCodes and ChannelCodeIDs map directly to each other and are always the same. For example, ChannelCodeID 2 is ALWAYS 'ABC'

Also, in the [AggregatedData] table, Channel1 is ALWAYS the lowest value, although this is incidental.

Are channels codes consistent in real-world (for example is channel 2 always the same (ABC in your example)? Also, is Chane1l in the new table always the lowest value? — Sparky
– Sparky, Commented Oct 28, 2014 at 18:53
Yes That is correct. ChannelCodes and IDs map directly to each other. It is also correct that Channel1 in [AggregatedData] is always the lowest value. — Stan Shaw
– Stan Shaw, Commented Oct 28, 2014 at 18:54
Do you clear out the aggregate table and rebuild it each time? — Sparky
– Sparky, Commented Oct 28, 2014 at 19:00
Nope, I don't clear it out, but this will always create a new record. There are other columns, month and year (which I intentionally omitted as to not unnecessarily obfuscate my question) - and these extra columns ensure that I'm always writing a new record to the [AggregatedData] table. — Stan Shaw
– Stan Shaw, Commented Oct 28, 2014 at 19:03

M.Ali · Accepted Answer · 2014-10-28 19:12:21Z

3

Test Data

DECLARE @TABLE TABLE (Account INT, ChannelCodeID INT, ChannelCode VARCHAR(10))
INSERT INTO @TABLE VALUES 
(12345 ,2 ,'ABC'),
(12345 ,4 ,'DEF'),
(12345 ,6 ,'GHI'),
(54321 ,2 ,'ABC'),
(54321 ,6 ,'GHI'),
(99999 ,2 ,'ABC')

Query

SELECT Account
      ,[Count]
      ,ISNULL([Channel1], 0) AS [Channel1]
      ,ISNULL([Channel2], 0) AS [Channel2]
      ,ISNULL([Channel3], 0) AS [Channel3]
      ,Names
FROM 
  (
    SELECT t.Account, T.ChannelCodeID, C.[Count]
          ,'Channel' + CAST(ROW_NUMBER() OVER 
                      (PARTITION BY t.Account ORDER BY t.ChannelCodeID ASC) AS VARCHAR(10))Channels
          ,STUFF((SELECT '.' + ChannelCode
                  FROM @TABLE 
                  WHERE Account = t.Account
                  FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'') AS Names
    FROM @TABLE t INNER JOIN (SELECT Account , COUNT(*) AS [Count]
                              FROM @TABLE 
                              GROUP BY Account) c
    ON T.Account = C.Account
  )A
PIVOT (MAX(ChannelCodeID)
       FOR Channels
       IN ([Channel1],[Channel2],[Channel3])
      )  p

Result

╔═════════╦═══════╦══════════╦══════════╦══════════╦═════════════╗
║ Account ║ Count ║ Channel1 ║ Channel2 ║ Channel3 ║    Names    ║
╠═════════╬═══════╬══════════╬══════════╬══════════╬═════════════╣
║   12345 ║     3 ║        2 ║        4 ║        6 ║ ABC.DEF.GHI ║
║   54321 ║     2 ║        2 ║        6 ║        0 ║ ABC.GHI     ║
║   99999 ║     1 ║        2 ║        0 ║        0 ║ ABC         ║
╚═════════╩═══════╩══════════╩══════════╩══════════╩═════════════╝

answered Oct 28, 2014 at 19:12

M.Ali

69.9k17 gold badges110 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Stan Shaw Over a year ago

I see it is comma and space delimiting the ChannelCodeList in your example. Is there a way to have it period-delimit without the spaces (e.g. 'ABC.DEF' instead of 'ABC, DEF')? If not, I can always run an update to replace ', ' with '.' on that column...

M.Ali Over a year ago

@Stan sorted ....... it looked a pretty simple query until I started to write it :)

Stan Shaw Over a year ago

Excellent - THANK YOU. Once I get this working in my environment, I'll accept the answer.

Sparky · Accepted Answer · 2014-10-28 19:16:57Z

1

-- Back up raw data into temp table

select * into #rawData FROM RawData

-- First, populate the lowest channel and base records

INSERT INTO AggregatedData (Account,Count,Channel1,Channel2,Channel3)
   SELECT AccountID,1,Min(ChannelCODEID),0,0
   FROM #RawData
   GROUP BY AccountID

-- Gives you something like this

 Account     Count    Chanel1   Channel2   Channel3    Names
    12345       1        2         0          0           NULL
    54321       1        2         6          0           NULL
    99999       1        2         0          0           NULL

--

DELETE FROM #rawData 
WHERE account + str(channelCodeID) in 
      (SELECT account + str(channelCodeID) FROM AggregatedData)

-- Now do an update

UPDATE AggregatedData SET channel2= xx.NextLowest,count= count+1
FROM
    (   SELECT AccountID,Min(ChannelCODEID) as NextLowest
       FROM #RawData
       GROUP BY AccountID ) xx
WHERE AggregatedData.account=xx.accountID

-- Repeat above for Channel3

You then need an update statement against the final aggregated table based on the channel id's. If not run often, I would suggest a UDF which takes 3 parameters and returns a string, some like

UPDATE AggregatedData SET [names] = dbo.BuildNameList(channel1,channel2,channel3)

Will run a bit slow, but still not bad overall

Hope this gives you some ideas

answered Oct 28, 2014 at 19:16

Sparky

15.1k2 gold badges34 silver badges45 bronze badges

3 Comments

Sparky Over a year ago

Looks like you've got some better options already, mine is based on generic SQL, but if you version of SQL is newer, the other solutions are better...

Stan Shaw Over a year ago

This instance is 2008 R2. Thanks, again! Which answer do you recommend?

Sparky Over a year ago

M.Ali's answers looks best for SQL 2008 R2. Good luck

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

Query:

WITH CTE AS (SELECT Account, ChannelCodeID, ChannelCode, RANK() OVER (PARTITION BY Account ORDER BY ChannelCodeID) [ChRank] FROM RawData)
SELECT A.Account, COUNT(Account) [Count], ISNULL((SELECT TOP 1 ChannelCodeID FROM CTE WHERE A.Account=CTE.Account AND ChRank=1),0) [Channel1],
       ISNULL((SELECT TOP 1 ChannelCodeID FROM CTE WHERE A.Account=CTE.Account AND ChRank=2),0) [Channel2],
       ISNULL((SELECT TOP 1 ChannelCodeID FROM CTE WHERE A.Account=CTE.Account AND ChRank=3),0) [Channel3],
       STUFF((SELECT '.'+ChannelCode FROM CTE WHERE A.Account=CTE.Account FOR XML PATH('')),1,1,'') [Names]
FROM RawData A
GROUP BY A.Account

This uses a Common Table Expression to group and then display the data.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Oct 28, 2014 at 19:04

Mr. Mascaro

2,7431 gold badge13 silver badges18 bronze badges

7 Comments

Stan Shaw Over a year ago

This returns an error "Subquery returned more than 1 value. This is not permitted..."

Mr. Mascaro Over a year ago

Is it possible for each Account to have duplicates in the ChannelCodeID field?

Stan Shaw Over a year ago

If you're asking if it's possible (for example) for account 12345 to have multiple records in the [RawData] table for ChannelCode2, the answer is no. As I stated in OP, 'The [RawData] table will have 1 record per account, per ChannelCodeID'. Thanks again for help!

Mr. Mascaro Over a year ago

What version of SQL Server are you using?

Stan Shaw Over a year ago

Thanks for the help - Check out M.Ali's answer.

|

Collectives™ on Stack Overflow

Aggregating SQL Data and concatenating strings / combining records

3 Answers 3

Test Data

Query

Result

3 Comments

3 Comments

Query:

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Test Data

Query

Result

3 Comments

3 Comments

Query:

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related