3

What I am trying to achieve is group them by id and create a column for the date as well as data.

The background of the dataset are it is lab result taken by participant and some test are not able to be taken on same day due to fasting restrictions n etc. The database I am using is SQL Server.

Below are my DataSet as well as the desired output.

Sample dataset:

create table Sample 
(
      Id int,
      LAB_DATE date,
      A_CRE_1 varchar(100),
      B_GLUH_1 varchar(100),
      C_LDL_1 varchar(100),
      D_TG_1 varchar(100),
      E_CHOL_1 varchar(100),
      F_HDL_1 varchar(100),
      G_CRPH_1 varchar(100),
      H_HBA1C_1 varchar(100),
      I_GLU120_1 varchar(100),
      J_GLUF_1 varchar(100),
      K_HCR_1 varchar(100)
)

insert into Sample(Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01, '2017-11-21', '74', '6.4', '2.04', '4.17', '1.64', '6.1', '2.54')

insert into sample (Id, LAB_DATE, I_GLU120_1) 
values (01, '2017-11-22','8.8')

insert into sample (Id, LAB_DATE, D_TG_1) 
values (01, '2017-11-23','0.56')

insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,K_HCR_1)       
values (2,'2018-10-02','57','8.91','2.43','1.28','3.99','1.25','3.19')

insert into sample (Id,LAB_DATE,H_HBA1C_1)                              
values (2,'2018-10-03','8.6')                       

insert into sample (Id,LAB_DATE,J_GLUF_1)                               
values (2,'2018-10-04','7.8')

insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,G_CRPH_1,H_HBA1C_1,K_HCR_1)
values (3,'2016-10-01','100','6.13','3.28','0.94','5.07','1.19','0.27','5.8','4.26')

Desired output:

ID|LAB_DATE|A_CRE_1|B_GLUH_1|C_LDL_1|Date_TG_1|D_TG_1|E_CHOL_1|F_HDL_1|G_CRPH_1|H_HBA1C_1|Date_GLU120_1|I_GLU120_1|J_GLUF_1|K_HCR_1
1|2017-11-21|74|6.4|2.04|2017-11-23|0.56|4.17|1.64|||6.1|2017-11-22|8.8|||2.54
2|02/10/2018|57|8.91|2.43||1.28|3.99|1.25||03/10/2018|8.6|||04/10/2018|7.8|3.19
3|01/10/2016|100|6.13|3.28||0.94|5.07|1.19|0.27||5.8|||||4.26

enter image description here

20
  • 1
    I don't see a clean way of doing this. What would be wrong with just inserting a complete record in one go? Commented Jan 31, 2019 at 5:01
  • There are no duplicated values between the rows you want? You just want the value of each? You could do something like a group by ID getting the min lab_date and each of the values. Commented Jan 31, 2019 at 5:02
  • What will you do if there are two values for D_TG_1 values in sample for participant id 01? Take the latest one? Commented Jan 31, 2019 at 5:03
  • @TimBiegeleisen I think the record is inserting on date basis so Stepahnie you need an update in that case Commented Jan 31, 2019 at 5:05
  • 1
    You seem to be saying you've gone from A -> B and now need to go from B -> C. I'm proposing you go again from A -> C Commented Jan 31, 2019 at 9:43

3 Answers 3

1

Here's a solution (that cannot cope with multiple rows of the same id/sample type - you haven't said what to do with those)

select * from

  (select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1 from sample) s1
  INNER JOIN
  (select Id, LAB_DATE as glu120date, I_GLU120_1 from sample) s2
  ON s1.id = s2.id
  (select Id, LAB_DATE as dtgdate, D_TG_1 from sample) s3
  ON s1.id = s3.id

Hopefully you get the idea with this pattern; if you have other sample types with their own dates, break them out of s1 and into their own subquery in a similar way (eg make an s4 for e_chol_1, s5 for k_hcr_1 etc). Note that if any sample type is missing it will cause the whole row to disappear from the results. If this is not desired and you accept NULL for missing samples, use LEFT JOIN instead of INNER

If there will be multiple samples for patient 01 and you only want the latest, the pattern becomes:

select * from

  (select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1,
   row_number() over(partition by id order by lab_date desc) rn
   from sample) s1
  INNER JOIN
  (select Id, LAB_DATE as glu120date, I_GLU120_1,
   row_number() over(partition by id order by lab_date desc) rn
   from sample) s2
  ON s1.id = s2.id and s1.rn = s2.rn
WHERE
  s1.rn = 1 

Note the addition of row_number() over(partition by id order by lab_date desc) rn - this establishes an incrementing counter in descending date order(latest record = 1, older = 2 ...) that restarts from 1 for every different id. We join on it too then say where rn = 1 to pick only the latest records for each sample type

Sign up to request clarification or add additional context in comments.

7 Comments

this isn't comprehensive enough because it is not necessary that D_TG_1 is done on the next day. perhaps i should add a bigger sample space?
I didn't understand your comment. This code doesn't care when sampling is done. I can only work with the materials you give, I know only what you tell me. I did say "hopefully you can see the pattern" and I outlined how to expand it- add more subqueries for s4, s5.. make EVERY sampling it's own subquery if you need to. 10 samples, 10 days, 10 subqueries 10 joins.. Ypu can optimize things if you know that 5 tests will always be taken on the same day, but the othe r5 tests might happen on a mix (sometimes simultaneously, sometimes down to 1 a day for 5 days) then you need 6 subqueries
..one subquery for the 5 that are one on th same day and another 5 that are done potentially individually
Ps; I'm not a fan of questions where the original source data isn't given; you have made efforts to transform he data already and have then asked us to write solutions on top of that. This can occasionally lead to poor interactions between what you've already done and what we provide, giving lower overall performance than if we'd seen and worked with the original data. Please avoid giving part baked data and presenting it as original source; either state the query you used to generate it or (better) state a representative sample of the source, possibly even as a db-fiddle.com or similar
Thank you for pointing it out. i have updated the sample dataset which i am working on. Thank you for trying to help.
|
1

As @Ben suggested, you can use group by id and take min for all column like below one.

DECLARE  @Sample as table (
  Id int,
  LAB_DATE date,
  A_CRE_1 varchar(100),
  B_GLUH_1 varchar(100),
  C_LDL_1 varchar(100),
  D_TG_1 varchar(100),
  E_CHOL_1 varchar(100),
  F_HDL_1 varchar(100),
  G_CRPH_1 varchar(100),
  H_HBA1C_1 varchar(100),
  I_GLU120_1 varchar(100),
  J_GLUF_1 varchar(100),
  K_HCR_1 varchar(100))

insert into @Sample(Id, LAB_DATE,A_CRE_1, 
B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01,'2017-11-21','74','6.4','2.04','4.17','1.64','6.1','2.54')

insert into @Sample (Id, LAB_DATE, I_GLU120_1) 
values (01, '2017-11-22','8.8')

insert into @Sample (Id, LAB_DATE, D_TG_1) 
values (01, '2017-11-23','0.56')

SELECT s.Id
 , MIN(s.LAB_DATE) AS LAB_DATE
 , MIN(s.A_CRE_1) AS A_CRE_1
 , MIN(s.B_GLUH_1) AS B_GLUH_1
 , MIN(s.C_LDL_1) AS C_LDL_1
 , MIN(s.D_TG_1) AS D_TG_1
 , MIN(s.E_CHOL_1) AS E_CHOL_1
 , MIN(s.F_HDL_1) AS F_HDL_1
 , MIN(s.G_CRPH_1) AS G_CRPH_1
 , MIN(s.H_HBA1C_1) AS H_HBA1C_1
 , MIN(s.I_GLU120_1) AS I_GLU120_1
 , MIN(s.J_GLUF_1) AS J_GLUF_1
 , MIN(s.K_HCR_1) AS K_HCR_1
FROM @Sample AS s
GROUP BY s.Id

You can also check the SQL Server STUFF function. Can take help from the below link https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-for-sql-server-data/

2 Comments

I downvoted this because it does not preserve each of the three dates, which the OP's output seems to have present.
That STUFF blog is really old, maybe suggest newer techniques like group_concat if the sqlserver is modern enough
0

Following on from my comments about presenting the original data, here's what I think you should do (taking the query you commented)

SELECT 
  ID, 
  MAX(CASE WHEN TestID='1' THEN Results END) [Test_1], 
  MAX(CASE WHEN TestID='2' THEN Results END) [Test_2], 
  MAX(CASE WHEN TestID='1' THEN Result_Date_Time END) Test12Date,
  MAX(CASE WHEN TestID='3' THEN Results END) [Test_3], 
  MAX(CASE WHEN TestID='3' THEN Result_Date_Time END) Test3Date

FROM [tbBloodSample] 
GROUP BY ID 
ORDER BY ID

Notes: If TestID is an int, don't use strings like '1' in your query, use ints. You don't need an ELSE NULL in a case- null is the default if the when didn't work out

Here is a query pattern. Test1 and 2 are always done on the same day, hence why I only pivot their date once. Test 3 might be done later, might be same, this means the dates in test12date and test3date might be same, might be different

Convert the strings to dates after you do the pivot, to reduce the number of conversions

5 Comments

thank you for pointing it out but this i am afraid is not possible as we have collected data and a system based on the requirements as well as sample space above. hence, i am seeking out help from this site if there is a clean way to convert multiple rows into one row.
This is the clean way to convert multiple rows into one - it's what a pivot operation does
thank you for sharing but this is what i did like 2 years ago with the first requirement. now i have the above dataset to work with.
Then use the other answer, extend the pattern to meet your needs using the information given in the body of the answer. We're here to help, but we don't give up time freely to write entire software systems for you.. There are of course, other sites where you can hire a freelancer to complete software to your exacting specification, but it's not what SO is for
yes yes. i've been testing out your other answer. it has been helpful. thank you! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.