2

I have a table in SQL with 2 fields ID (int) and RawData (varchar(max))

The raw data field contains pipe delimited data

AB|1223445|01012017|abc|........

There are 75 fields in the records and there are upto 1 million rows.

I need to parse out this data and place it in another table with 75 fields. the data contains dates, ints, varchars (max is varchar(70)), numeric(5,2) and money field/data types.

Currently I am using a function to split the string and then pivoting the returned data and inserting it into the table. but this is taking 4 hours.

Is there a quicker way of doing this?

I am using SQL 2014

If I haven't given enough info please ask me any questions you have.

thanks

FYI

Function being used to split each record

ALTER FUNCTION [dbo].[fn_SplitString] 
( 
    @string NVARCHAR(MAX), 
    @delimiter CHAR(1) 
    ) 
    RETURNS @output TABLE(ID int, splitdata NVARCHAR(MAX) 
    ) 
    BEGIN 
        DECLARE @start INT, @end INT, @Count INT
        set @Count = 1
        SELECT @start = 1, @end = CHARINDEX(@delimiter, @string) 
        WHILE @start < LEN(@string) + 1 BEGIN 
            IF @end = 0  
                SET @end = LEN(@string) + 1

        INSERT INTO @output (ID, splitdata)  
        VALUES(@Count, SUBSTRING(@string, @start, @end - @start)) 
        SET @start = @end + 1 
        SET @end = CHARINDEX(@delimiter, @string, @start)
        Set @Count = @Count+1
    END 
    RETURN 
END

PIVOT being used per record

 While @CountRec > 0
    BEGIN
        Select top(1) @RawData = RawData from @RawDataTable     

        Insert into RawData_Parsed(01, 02, 03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,
        21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,
        41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,
        61,62,63,64,65,66,67,68,69,70,71,72,73,74,75)
        select  [1],[2],[3],substring([4],5,4) +'/'+ substring([4],3,2) +'/'+ substring([4],1,2),[5],[6],[7],[8],[9],[10],[11],[12],[13],
            case when len([14]) = 0 then NULL else substring([14],5,4) +'/'+ substring([14],3,2) +'/'+ substring([14],1,2) end,
            case when len([15]) = 0 then NULL else substring([15],5,4) +'/'+ substring([15],3,2) +'/'+ substring([15],1,2) end,
            case when len([16]) = 0 then NULL else substring([16],5,4) +'/'+ substring([16],3,2) +'/'+ substring([16],1,2) end,
            case when len([17]) = 0 then NULL else substring([17],5,4) +'/'+ substring([17],3,2) +'/'+ substring([17],1,2) end,
            case when len([18]) = 0 then NULL else substring([18],5,4) +'/'+ substring([18],3,2) +'/'+ substring([18],1,2) end,
            [19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31],
            case when len([32]) = 0 then NULL else substring([32],5,4) +'/'+ substring([32],3,2) +'/'+ substring([32],1,2) end,
            case when len([33]) = 0 then NULL else substring([33],5,4) +'/'+ substring([33],3,2) +'/'+ substring([33],1,2) end,
            [34],[35],[36],[37],[38],[39],[40],[41],
            case when len([42]) = 0 then NULL else substring([42],5,4) +'/'+ substring([42],3,2) +'/'+ substring([42],1,2) end,
            case when len([43]) = 0 then NULL else substring([43],5,4) +'/'+ substring([43],3,2) +'/'+ substring([43],1,2) end,
            [44],
            case when len([45]) = 0 then NULL else [45] end,
            [46],[47],
            case when len([48]) = 0 then NULL else substring([48],5,4) +'/'+ substring([48],3,2) +'/'+ substring([48],1,2) end,
            case when len([49]) = 0 then NULL else substring([49],5,4) +'/'+ substring([49],3,2) +'/'+ substring([49],1,2) end,
            [50],
            case when len([51]) = 0 then NULL else [51] end,
            [52],[53],
            case when len([54]) = 0 then NULL else substring([54],5,4) +'/'+ substring([54],3,2) +'/'+ substring([54],1,2) end,
            case when len([55]) = 0 then NULL else substring([55],5,4) +'/'+ substring([55],3,2) +'/'+ substring([55],1,2) end,
            [56],
            case when len([57]) = 0 then NULL else [57] end,
            [58],[59],
            case when len([60]) = 0 then NULL else substring([60],5,4) +'/'+ substring([60],3,2) +'/'+ substring([60],1,2) end,
            case when len([61]) = 0 then NULL else substring([61],5,4) +'/'+ substring([61],3,2) +'/'+ substring([61],1,2) end,
            [62],
            case when len([63]) = 0 then NULL else [63] end,
            [64],[65],
            case when len([66]) = 0 then NULL else substring([66],5,4) +'/'+ substring([66],3,2) +'/'+ substring([66],1,2) end,
            case when len([67]) = 0 then NULL else substring([67],5,4) +'/'+ substring([67],3,2) +'/'+ substring([67],1,2) end,
            [68],
            case when len([69]) = 0 then NULL else [69] end,
            [70],[71],
            case when len([72]) = 0 then NULL else substring([72],5,4) +'/'+ substring([72],3,2) +'/'+ substring([72],1,2) end,
            case when len([73]) = 0 then NULL else substring([73],5,4) +'/'+ substring([73],3,2) +'/'+ substring([73],1,2) end,
            [74],
            case when len([75]) = 0 then NULL else [75] end,
            @JobTaskID
        from 
        (
            select ID,splitData from dbo.fn_splitstring(@RawData, '|')
        ) src
        pivot
        (
            MAX(splitData) for Id in ( [1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25]
                    ,[26],[27],[28],[29],[30],[31],[32],[33],[34],[35],[36],[37],[38],[39],[40],[41],[42],[43],[44],[45],[46],[47],[48]
                    ,[49],[50],[51],[52],[53],[54],[55],[56],[57],[58],[59],[60],[61],[62],[63],[64],[65],[66],[67],[68],[69],[70],[71]
                    ,[72],[73],[74],[75])
        ) piv;

        delete top(1) from @RawDataTable
        set @CountRec = @CountRec - 1
        set @RawData = null

    end
4
  • 4
    Forget about suboptimal split functions. Export src table to file and import to target table with BULK. Commented Oct 15, 2017 at 11:42
  • 1
    @lad2025 never thought of that, thanks Commented Oct 15, 2017 at 11:57
  • add some sample data & desired result which could help to resolve your issue. Commented Oct 16, 2017 at 5:30
  • @YogeshSharma the desired result is to get the data split and into the correct format quickly. It's only slightly quicker to split the fields as varchars so I think the data is almost irrelevant. The issue is the methods I'm using to split the data Commented Oct 17, 2017 at 21:37

2 Answers 2

1

You can generate dynamic sql texts and execute it.

While @CountRec > 0
BEGIN
    Select top(1) @RawData = RawData from @RawDataTable     

    DECLARE @SqlText NVARCHAR(MAX)

    SET @SqlText = 'INSERT INTO RawData_Parsed VALUES (''' + REPLACE (@RawData,'|',''',''') + ''')'

    EXEC (@SqlText)

    delete top(1) from @RawDataTable
    set @CountRec = @CountRec - 1
    set @RawData = null

END

After insertion completed you can update data for formatting.

UPDATE RawData_Parsed
SET
    [4]= case when len([4]) = 0 then NULL else substring([4],5,4) +'/'+ substring([4],3,2) +'/'+ substring([4],1,2) end,
    [14] = case when len([14]) = 0 then NULL else substring([14],5,4) +'/'+ substring([14],3,2) +'/'+ substring([14],1,2) end,
    [15] = case when len([15]) = 0 then NULL else substring([15],5,4) +'/'+ substring([15],3,2) +'/'+ substring([15],1,2) end,
    [16] = case when len([16]) = 0 then NULL else substring([16],5,4) +'/'+ substring([16],3,2) +'/'+ substring([16],1,2) end,
    [17] = case when len([17]) = 0 then NULL else substring([17],5,4) +'/'+ substring([17],3,2) +'/'+ substring([17],1,2) end,
    [18] = case when len([18]) = 0 then NULL else substring([18],5,4) +'/'+ substring([18],3,2) +'/'+ substring([18],1,2) end,
    [32] = case when len([32]) = 0 then NULL else substring([32],5,4) +'/'+ substring([32],3,2) +'/'+ substring([32],1,2) end,
    [33] = case when len([33]) = 0 then NULL else substring([33],5,4) +'/'+ substring([33],3,2) +'/'+ substring([33],1,2) end,
    [42] = case when len([42]) = 0 then NULL else substring([42],5,4) +'/'+ substring([42],3,2) +'/'+ substring([42],1,2) end,
    [43] = case when len([43]) = 0 then NULL else substring([43],5,4) +'/'+ substring([43],3,2) +'/'+ substring([43],1,2) end,
    [45] = case when len([45]) = 0 then NULL else [45] end,
    [48] = case when len([48]) = 0 then NULL else substring([48],5,4) +'/'+ substring([48],3,2) +'/'+ substring([48],1,2) end,
    [49] = case when len([49]) = 0 then NULL else substring([49],5,4) +'/'+ substring([49],3,2) +'/'+ substring([49],1,2) end,
    [51] = case when len([51]) = 0 then NULL else [51] end,
    [54] = case when len([54]) = 0 then NULL else substring([54],5,4) +'/'+ substring([54],3,2) +'/'+ substring([54],1,2) end,
    [55] = case when len([55]) = 0 then NULL else substring([55],5,4) +'/'+ substring([55],3,2) +'/'+ substring([55],1,2) end,
    [57] = case when len([57]) = 0 then NULL else [57] end,
    [60] = case when len([60]) = 0 then NULL else substring([60],5,4) +'/'+ substring([60],3,2) +'/'+ substring([60],1,2) end,
    [61] = case when len([61]) = 0 then NULL else substring([61],5,4) +'/'+ substring([61],3,2) +'/'+ substring([61],1,2) end,
    [63] = case when len([63]) = 0 then NULL else [63] end,
    [66] = case when len([66]) = 0 then NULL else substring([66],5,4) +'/'+ substring([66],3,2) +'/'+ substring([66],1,2) end,
    [67] = case when len([67]) = 0 then NULL else substring([67],5,4) +'/'+ substring([67],3,2) +'/'+ substring([67],1,2) end,
    [69] = case when len([69]) = 0 then NULL else [69] end,
    [72] = case when len([72]) = 0 then NULL else substring([72],5,4) +'/'+ substring([72],3,2) +'/'+ substring([72],1,2) end,
    [73] = case when len([73]) = 0 then NULL else substring([73],5,4) +'/'+ substring([73],3,2) +'/'+ substring([73],1,2) end,
    [75] = case when len([75]) = 0 then NULL else [75] end
Sign up to request clarification or add additional context in comments.

4 Comments

wouldn't that would require the creation of a new table with 75 fields that are all varchars parsing the data into that and then inserting the data into the final table - as the correct data type. Would that speed it up?
using a temp table that have all varchar columns, and select and insert from that table probably works faster.
are temp tables not slower than physical tables?
actually there are no difference between temp and permanent tables. Slow and fast depends on indexes on the tables
0

FYI

I followed lad2025's suggestion to export the data to text files and imported it that way using SSIS.

So the SSIS package I created extracts the data and exports it to a txt file that then reloads it as a pipe delimited text file with the relevant number of columns.

I then carry out all required processing on the data.

The way I initially imported the data was taking 4+ hours its now running in less than 10 minutes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.