1

I am having some trouble creating some SQL (for SQL server 2008).

I have a table of tasks that are priority ordered, comma delimited tasks:

Id = 1, LongTaskName = "a,b,c"
Id = 2, LongTaskName = "a,c"
Id = 3, LongTaskName = "b,c"
Id = 4, LongTaskName = "a"
etc...

I am trying to build a new table that groups them by the first task, along with the id:

GroupName: "a", TaskId: 1
GroupName: "a", TaskId: 2
GroupName: "a", TaskId: 4
GroupName: "b", TaskId: 3

Here is the naive, slow, linq code:

foreach(var t in Tasks)
{
    var gt = new GroupedTasks();
    gt.TaskId = t.Id;

    var firstWord = t.LongTaskName.Split(',');
    if(firstWord.Count() > 0)
    {
        gt.GroupName = firstWord.First();
    }
    else 
    {
        gt.GroupName = t.LongTaskName;
    }
    GroupedTasks.InsertOnSubmit(gt);
}

I wrote a sql function to do the string split:

create function fn_Split(
@String nvarchar (4000),
@Delimiter nvarchar (10)
)
returns nvarchar(4000)

begin
declare @FirstComma int

set @FirstComma = charindex(@Delimiter,@String)
if(@FirstComma = 0)
return @String

return substring(@String, 0, @FirstComma)
end
go

However, I am getting stuck on the real sql to do the work. I can get the group by alone:

SELECT dbo.fn_Split(LongTaskName, ',')
  FROM [dbo].[Tasks]
  GROUP BY dbo.fn_Split(LongTaskName, ',')

And I know I need to head down something like this:

DECLARE @RowSet TABLE (GroupName nvarchar(1024), Id nvarchar(5))
insert into @RowSet
select ???
FROM [dbo].Tasks as T
INNER JOIN
(
    SELECT dbo.fn_Split(LongTaskName, ',')
      FROM [dbo].[Tasks]
      GROUP BY dbo.fn_Split(LongTaskName, ',')
) G
ON T.??? = G.???
ORDER BY ???
INSERT INTO dbo.GroupedTasks(GroupName, Id)
select * from  @RowSet

But I am not quite groking how to reference the grouped relationships and am confused about having to call split multiple times.

Any thoughts?

2 Answers 2

2

If you only care about the first item in the list, there's no need really for a function. I would recommend this way. You also don't need the @RowSet table variable for any temporary holding.

INSERT dbo.GroupedTasks(GroupName, Id)
SELECT 
    LEFT(LongTaskName, COALESCE(NULLIF(CHARINDEX(',', LongTaskName)-1, -1), 1024)),
    Id
FROM dbo.Tasks;

It is even easier if the tasks are 1-character long, you can use LEFT(LongTaskName, 1) instead of the ugly SUBSTRING/CHARINDEX mess. But I'm guessing your task names are not one character long (if this is the case, you should include some data that varies a bit so that others don't make assumptions about length).

Now, keep in mind that you'll have to do something like this to keep dbo.GroupedTasks up to date every time a dbo.Tasks row is inserted, updated or deleted. How are you going to keep these two tables in sync?

More to the point, you should consider storing the top priority task separately in the first place, either by using a computed column or separating it out before the insert. Munging data together is something that you do with hash tables and arrays in application code, but it rarely has any positive attributes inside a database. You almost always spend more time and effort extracting the data apart than you ever saved by keeping it together in the first place. This will negate the need for a second table at all.

Sign up to request clarification or add additional context in comments.

1 Comment

Wow, nice. The problem is actually made up. It is a simplified version of a one time import problem I am working on, so I don't have to rerun it. But thank you so much for your excellent answer and pointing out that potential issue!
0

Select Id, Split( ',', LongTaskName ) as GroupName into TasksWithGroupInfo Does this answer your question?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.