2

I have a table with a column with following entries-

Drug
Sertraline 100mg tablets
Phenobarbitol 20mg capsules 

I want this column to be split into four-

Drugname    Strength   Units   Form
Sertraline  100        mg      tablets

Could someone please guide me as to who to do this?

7
  • Have you tried anything? There are many resources on this online Commented Feb 1, 2018 at 12:00
  • Don't write such an entry in the fist place. This is a bad idea in all databases. It breaks even the 1st normal form. If you want to be able to query those values, store them in a separate table Commented Feb 1, 2018 at 12:01
  • Yes. I tried LEFT and SUBSTRING. But am not quite getting what I want. Commented Feb 1, 2018 at 12:02
  • You can try STRING_SPLIT in SQL Server 2016, but it's a LOT easier to clean up the data when inserting it. SQL isn't the best language for string manipulation and definitely unsuitable for parsing. Commented Feb 1, 2018 at 12:04
  • I guess you have a lot of data and in each case the format could be different. So we could write something for that data but it won't work other data. Commented Feb 1, 2018 at 12:04

3 Answers 3

3

With a little XML and and CROSS APPLY

The pattern is clear and easy to expand or contract as needed

Example

Select A.* 
      ,B.*
 From  YourTable A
 Cross Apply (
                Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
                      ,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
                      ,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
                      ,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
                      ,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
                From  (Select Cast('<x>' + replace((Select replace(A.[Drug],' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as B1
             ) B

Returns

Pos1            Pos2    Pos3        Pos4    Pos5
Sertraline      100mg   tablets     NULL    NULL
Phenobarbitol   20mg    capsules    NULL    NULL
Sign up to request clarification or add additional context in comments.

Comments

2

An one more suggestion:

The first CTE transforms your CSV string to an XML, which allows to address each part separately.
The second CTE retrieves the three parts.
The final SELECT uses some string methods to separate strength and unit.

DECLARE @tbl TABLE(Drug VARCHAR(100));
INSERT INTO @tbl VALUES('Sertraline 100mg tablets')
                      ,('Phenobarbitol 20mg capsules');
WITH Splitted AS
(
    SELECT CAST('<x>' + REPLACE((SELECT Drug AS [*] FOR XML PATH('')),' ','</x><x>') + '</x>' AS XML) AS Casted
    FROM @tbl 
)
,Parts AS
(
    SELECT Casted.value('/x[1]/text()[1]','nvarchar(100)') AS Drugname
          ,Casted.value('/x[2]/text()[1]','nvarchar(100)') AS CombinedStrenthUnit
          ,Casted.value('/x[3]/text()[1]','nvarchar(100)') AS Form
    FROM Splitted
)
SELECT *
      ,LEFT(CombinedStrenthUnit,PATINDEX('%[a-zA-Z]%',CombinedStrenthUnit)-1) AS Strength 
      ,SUBSTRING(CombinedStrenthUnit,PATINDEX('%[a-zA-Z]%',CombinedStrenthUnit),1000) AS Unit
FROM Parts;

The result

Drugname        S&U     Form        Strength    Unit
Sertraline      100mg   tablets     100         mg
Phenobarbitol   20mg    capsules    20          mg

Comments

1

I have used a user-defined split function for splitting text into 3 pieces seperated by space character as follows

Of course if you have SQL Server 2016 or later then you can use STRING_SPLIT SQL function too

with rawdata as (
    select rn = ROW_NUMBER() over (order by txt), * from drugs
), cte as (
select
    rn,
    d.txt,
    s.id,
    s.val
from rawdata d
cross apply dbo.Split(rtrim(ltrim(d.txt)),' ') s
)
select * from cte

Please note that the Row_Number rn column is required to identify each row in following script. If you have a PK field in your source table, instead of using rn field created by Row_Number function, you can directly use those Primary Key fields

And to split the second column (strength and units), I again preferred to use custom SQL functions; ClearNumericCharacters and ClearNonNumericCharacters Of course you can use inline functions or RegExp instead of UDFs

Here is the final SQL CTE expression

with rawdata as (
    select rn = ROW_NUMBER() over (order by txt), * from drugs
), cte as (
select
    rn,
    d.txt,
    s.id,
    s.val
from rawdata d
cross apply dbo.Split(rtrim(ltrim(d.txt)),' ') s
), cte2 as (
select
rn,
case when id = 1 then val end as Drugname,
case when id = 2 then dbo.ClearNonNumericCharacters(val) end as Strength,
case when id = 2 then dbo.ClearNumericCharacters(val) end as Units,
case when id = 3 then val end as Form
from cte
)
select
    max(Drugname) Drugname,
    max(Strength) Strength,
    max(Units) Units,
    max(Form) Form
from cte2
group by rn

And the output is

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.