merge rows based on field value linux

Question

I'm trying to merge columns based on the value in the first field. I've tried using awk, but to no avail. Please see example input and output:

Input:  
10013   97      1503384  
10013   196     1506234  
10013   61      1507385  
10013   1559    1508385  
10014   1726    1514507  
10014   960     1519162  
10015   1920    1545535  
10015   124     1548915  
10015   77      1550284  

Desired_Output:  
10013   97,196,61,1559  1503384,1506234,1507385,1508385  
10014   1726,960        1514507,1519162  
10015   1920,124,77     1545535,1548915,1550284

Thanks in advance for any advice!

Welcome to stackoverflow, please use code tags for your Input(s) and scripts. — RavinderSingh13
– RavinderSingh13, Commented Nov 28, 2017 at 16:35
it always helps to post your script even if it's not working as desired. — karakfa
– karakfa, Commented Nov 28, 2017 at 17:05

RomanPerekhrest · Accepted Answer · 2017-11-28 16:48:39Z

5

The shortest GNU datamash solution:

datamash -sW -g1 collapse 2 collapse 3 <file

-g1 - group by the 1st field
collapse N - operation producing comma-separated list of all input values of the field N within each group

The output:

10013   97,196,61,1559  1503384,1506234,1507385,1508385
10014   1726,960    1514507,1519162
10015   1920,124,77 1545535,1548915,1550284

answered Nov 28, 2017 at 16:48

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

karakfa Over a year ago

This looks like a very useful tool for these kind of tasks.

RomanPerekhrest Over a year ago

@karakfa, it's very convenient for simple grouping/aggregation and arithmetic operations. Recommended "stuff"

vaettchen Over a year ago

Just make sure that you have the most recent version - not all distribution repositories are up-to-date. See their dowload page.

Ed Morton · Accepted Answer · 2017-11-28 18:33:19Z

2

$ cat tst.awk
$1 != f1 { if (NR>1) print f1, f2, f3; f1=f2=f3=s="" }
{ f1=$1; f2=f2 s $2; f3=f3 s $3; s="," }
END { print f1, f2, f3 }

$ awk -f tst.awk file | column -t
10013  97,196,61,1559  1503384,1506234,1507385,1508385
10014  1726,960        1514507,1519162
10015  1920,124,77     1545535,1548915,1550284

answered Nov 28, 2017 at 18:33

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

2 Comments

Naveed Over a year ago

How do we tweak this for unknown number of columns?

Ed Morton Over a year ago

@Naveed Write a loop? Post a new question with sample input/output if you'd like more help.

karakfa · Accepted Answer · 2017-11-28 16:59:49Z

1

awk to the rescue!

$ awk '{f2[$1]=f2[$1] sep[$1] $2;                   # concatenate 2nd field 
        f3[$1]=f3[$1] sep[$1] $3;                   # concatenate 3rd field 
        sep[$1]=","}                                # lazy init separator to skip first
   END {for(k in f2) print k,f2[k],f3[k]}' file |   # iterate over keys and print
  column -t                                         # pretty print


10013  97,196,61,1559  1503384,1506234,1507385,1508385
10014  1726,960        1514507,1519162
10015  1920,124,77     1545535,1548915,1550284

note the output order is not guaranteed, but you can sort by the first field.

answered Nov 28, 2017 at 16:59

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Comments

RomanPerekhrest · Accepted Answer · 2017-11-28 17:07:15Z

0

Awk solution (assuming that the input lines are already sorted):

awk '!a[$1]++{ if ("f2" in b) { print f1, b["f2"], b["f3"]; delete b } }
     { 
         f1=$1; 
         b["f2"]=(b["f2"]!=""? b["f2"]",":"")$2; 
         b["f3"]=(b["f3"]!=""? b["f3"]",":"")$3 
     }
     END{ print f1, b["f2"], b["f3"] }' OFS='\t file

delete b - with this action we'll prevent the array b from holding all values during the processing (saving memory). It will be cleared on each unique 1st field value

The output:

10013   97,196,61,1559  1503384,1506234,1507385,1508385
10014   1726,960    1514507,1519162
10015   1920,124,77 1545535,1548915,1550284

edited Nov 28, 2017 at 17:07

answered Nov 28, 2017 at 16:58

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Collectives™ on Stack Overflow

merge rows based on field value linux

4 Answers 4

3 Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related