AWK - use column value from input to create new columns on output

Question

I have a file like this:

       Date     Time   Level Gridsize    Miss      Temp    Parameter ID
 1988-05-16 12:00:00      -3        1       0     27.060    -1            
 1988-05-16 12:00:00      -3        1       0     9.0300    -2            
 1988-05-16 12:00:00      -3        1       0     1.2000    -3            
 1988-05-17 12:00:00      -3        1       0     27.100    -1            
 1988-05-17 12:00:00      -3        1       0     9.0200    -2            
 1988-05-17 12:00:00      -3        1       0     1.2300    -3            
 1988-05-18 12:00:00      -3        1       0     27.190    -1            
 1988-05-18 12:00:00      -3        1       0     9.0400    -2            
 1988-05-18 12:00:00      -3        1       0     1.2200    -3

These are temperature data from sensors in different locations. The field Parameter ID determines if the data was taken from sensor -1, -2 or -3. I'm exporting this data to a CSV file to analyze it. The desired format is:

   Date     Time           -1       -2        -3    
1988-05-16 12:00:00      27.060    9.0300    1.2000                     
1988-05-17 12:00:00      27.100    9.0200    1.2300
1988-05-18 12:00:00      27.190    9.0400    1.2200

It groups data by date and time, and separates in columns regarding the value of Parameter ID.

I'm not sure about how doable this is with AWK, but I'm once I mostly use SSH to prepare this data, a solution with AWK is very interesting for me. Also, if there are other tools to achieve this on bash, I'm interested :) (sed, whatever, but got to use native linux commands because I'm not allowed to install anything on the server)

What I'm doing nowadays

Nowadays I run a code for each sensor

$ awk '$NF == "-1" {print $1";"$2";"$6}' > netcdf_extract_1.csv
$ awk '$NF == "-2" {print $1";"$2";"$6}' > netcdf_extract_2.csv
$ awk '$NF == "-3" {print $1";"$2";"$6}' > netcdf_extract_3.csv

And import them on python. Then I group them by date and time and got my table.

Well, nowadays I'm getting 3 files, one for each sensor and joining them on python. Also, I tried something. I'll post the codes. — rvbarreto
– rvbarreto, Commented Nov 10, 2017 at 12:03

RomanPerekhrest · Accepted Answer · 2017-11-10 12:08:24Z

2

GNU awk solution:

awk 'BEGIN{ PROCINFO["sorted_in"]="@ind_str_asc"; }
     NR > 1{ a[$1"\t"$2][$7]=$6 }
     END{ 
         printf "Date\tTime\t-1\t-2\t-3\n"; 
         for (i in a) print i,a[i][-1],a[i][-2],a[i][-3] 
     }' file | column -t

a[$1"\t"$2][$7]=$6 :
- a - multidimensional array
- $1"\t"$2 - array key/index constructed by concatenation of Date/Time fields
- [$7] - inner array indexed with Parameter ID field value
- $6 - crucial value from Temp field

The output:

Date        Time      -1      -2      -3
1988-05-16  12:00:00  27.060  9.0300  1.2000
1988-05-17  12:00:00  27.100  9.0200  1.2300
1988-05-18  12:00:00  27.190  9.0400  1.2200

answered Nov 10, 2017 at 12:08

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

rvbarreto Over a year ago

I got an error when I run this on ubuntu 17.04: awk: line 2: syntax error at or near [ awk: line 5: syntax error at or near [ Any clue?

RomanPerekhrest Over a year ago

@rvbarreto, check if you copy/pasted properly. It works fine, see the screenshot ibb.co/jKqVLw

rvbarreto Over a year ago

I may be letting something behind. Just did the same and got the error, https://ibb.co/cuNS0w

RomanPerekhrest Over a year ago

@rvbarreto, My first sentence clearly says GNU awk. That's the main condition.

rvbarreto Over a year ago

Ok, I found the problem. For people using awk or gawk before v4, true multiarrays are not supported. In this case, you have to use associative arrays:

awk 'BEGIN { PROCINFO["sorted_in"]="@ind_str_asc"; } NR > 1 {a[$1"\t"$2,$7]=$6; if($NF==-1){b[$1"\t"$2]=$6}} END{printf "Date\tTime\t-1\t-2\t-3\n"; for (i in b){print i, a[i,-1],a[i,-2],a[i,-3]}}'

rvbarreto · Accepted Answer · 2017-11-11 16:22:47Z

1

Here is my attempt:

awk 'BEGIN{print "Date\t","Time\t", "-1\t", "-2\t", "-3\t";
    PROCINFO["sorted_in"] = "@ind_str_asc" }
    NR > 1{ a[$1] = a[$1]" " $6 ; b[$1] = $2;next }
    END{ for ( i in a ) print i, b[i], a[i] }' file1 | column -t

Output:

Date        Time      -1      -2      -3
1988-05-16  12:00:00  27.060  9.0300  1.2000
1988-05-17  12:00:00  27.100  9.0200  1.2300
1988-05-18  12:00:00  27.190  9.0400  1.2200

As Roman has everything in one array I splittet the task up to two arrays.

edited Nov 11, 2017 at 16:22

rvbarreto

69110 silver badges24 bronze badges

answered Nov 10, 2017 at 16:57

JFS31

5185 silver badges14 bronze badges

1 Comment

rvbarreto Over a year ago

Nice! It's similar to the code I posted on Roman's solution, but much more elegant! Thanks!

Collectives™ on Stack Overflow

AWK - use column value from input to create new columns on output

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related