0

I am looking to take a string of directory path and parse information out of it into existing columns on another table. This is for the purpose of creating a staging table for reporting. It will be parsing many directory paths if the ProjectName is applicable to the change in structure.

Data Example:

Table1_Column1
ProjectName\123456_ProjectShortName\Release_1\Iteration\etc


Expected Output:

Table2_Column1, Table2_Column2
123456          ProjectShortName

I've figured out how to parse some strings by character but it seems a bit clunky and inefficient. Is there a better structure to go about this? To add some more to it, this is just one column I need to manipulate before shifting it over there are three other columns that are being directly shifted to the staging table based on the ProjectName.

Is it better to just create a UDF to split then call it within the job that will move the data or is there another way?

3
  • What is the rule for column1 being 123456 and NOT ProjectName? Commented Sep 29, 2016 at 21:03
  • Column1 is an ID created by a business process, point being that they have to be separated into their own columns. They could go to any two columns on table2. Commented Sep 29, 2016 at 21:05
  • You asked for a performance splitter... check this out: sqlservercentral.com/articles/Tally+Table/72993 Commented Sep 29, 2016 at 21:06

2 Answers 2

1

Here's a method without a UDF.
It uses charindex and substring to get the parts from that path string.

An example using a table variable:

declare @T table (Table1_Column1 varchar(100));
insert into @T values 
('ProjectName\123456_ProjectShortName\Release_1\Iteration\etc'),
('OtherProjectName\789012_OtherProjectShortName\Release_2\Iteration\xxx');

select
case 
when FirstBackslashPos > 0 and FirstUnderscorePos > 0 
then substring(Col1,FirstBackslashPos+1,FirstUnderscorePos-FirstBackslashPos-1) 
end as Table1_Column1,
case 
when FirstUnderscorePos > 0 and SecondBackslashPos > 0 
then substring(Col1,FirstUnderscorePos+1,SecondBackslashPos-FirstUnderscorePos-1) 
end as Table1_Column2
from (
  select 
  Table1_Column1 as Col1, 
  charindex('\',Table1_Column1) as FirstBackslashPos,
  charindex('_',Table1_Column1) as FirstUnderscorePos,
  charindex('\',Table1_Column1,charindex('\',Table1_Column1)+1) as SecondBackslashPos
from @T
) q;

If you want to calculate only one into a variable

declare @ProjectPath varchar(100);
set @ProjectPath = 'ProjectName\123456_ProjectShortName\Release_1\Iteration\etc';

declare @FirstBackslashPos int  = charindex('\',@ProjectPath);
declare @FirstUnderscorePos int = charindex('_',@ProjectPath,@FirstBackslashPos);
declare @SecondBackslashPos int = charindex('\',@ProjectPath,@FirstBackslashPos+1);
declare @ProjectNumber varchar(30) = case when @FirstBackslashPos > 0 and @FirstUnderscorePos > 0 then substring(@ProjectPath,@FirstBackslashPos+1,@FirstUnderscorePos-@FirstBackslashPos-1)end;
declare @ProjectShortName varchar(30) = case when @FirstUnderscorePos > 0 and @SecondBackslashPos > 0 then substring(@ProjectPath,@FirstUnderscorePos+1,@SecondBackslashPos-@FirstUnderscorePos-1) end;

select @ProjectNumber as ProjectNumber, @ProjectShortName as ProjectShortName;

But i.m.h.o. it might be worth the effort to add some CLR that brings true regex matching to the SQL server. Since CHARINDEX and PATINDEX are not as flexible as regex.

Sign up to request clarification or add additional context in comments.

2 Comments

I cant Insert into. I need to dynamically parse all the directory paths in that column. No string literals like that.
This is definitely what I am looking for. Thank you so much for teaching me this method.
0

The following is a SUPER fast Parser but it is limited to 8K bytes. Notice the Returned Sequence Number... Perhaps you can key off of that because I am still not clear on the logic for why column1 is 123456 and not ProjectName

Declare @String varchar(max) = 'ProjectName\123456_ProjectShortName\Release_1\Iteration\etc'

Select * from [dbo].[udf-Str-Parse-8K](@String,'\')

Returns

RetSeq  RetVal
1       ProjectName
2       123456_ProjectShortName
3       Release_1
4       Iteration
5       etc

The UDF if needed

CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
           cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
           cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)

    Select RetSeq = Row_Number() over (Order By A.N)
          ,RetVal = Substring(@String, A.N, A.L) 
    From   cte4 A
);
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')

1 Comment

The order can be either, 123456 is an example of an ID included in that string that has to be separated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.