1

I have a file like this:

       Date     Time   Level Gridsize    Miss      Temp    Parameter ID
 1988-05-16 12:00:00      -3        1       0     27.060    -1            
 1988-05-16 12:00:00      -3        1       0     9.0300    -2            
 1988-05-16 12:00:00      -3        1       0     1.2000    -3            
 1988-05-17 12:00:00      -3        1       0     27.100    -1            
 1988-05-17 12:00:00      -3        1       0     9.0200    -2            
 1988-05-17 12:00:00      -3        1       0     1.2300    -3            
 1988-05-18 12:00:00      -3        1       0     27.190    -1            
 1988-05-18 12:00:00      -3        1       0     9.0400    -2            
 1988-05-18 12:00:00      -3        1       0     1.2200    -3            

These are temperature data from sensors in different locations. The field Parameter ID determines if the data was taken from sensor -1, -2 or -3. I'm exporting this data to a CSV file to analyze it. The desired format is:

   Date     Time           -1       -2        -3    
1988-05-16 12:00:00      27.060    9.0300    1.2000                     
1988-05-17 12:00:00      27.100    9.0200    1.2300
1988-05-18 12:00:00      27.190    9.0400    1.2200   

It groups data by date and time, and separates in columns regarding the value of Parameter ID.

I'm not sure about how doable this is with AWK, but I'm once I mostly use SSH to prepare this data, a solution with AWK is very interesting for me. Also, if there are other tools to achieve this on bash, I'm interested :) (sed, whatever, but got to use native linux commands because I'm not allowed to install anything on the server)


What I'm doing nowadays

Nowadays I run a code for each sensor

$ awk '$NF == "-1" {print $1";"$2";"$6}' > netcdf_extract_1.csv
$ awk '$NF == "-2" {print $1";"$2";"$6}' > netcdf_extract_2.csv
$ awk '$NF == "-3" {print $1";"$2";"$6}' > netcdf_extract_3.csv

And import them on python. Then I group them by date and time and got my table.

2
  • Did you try anything at all? Commented Nov 10, 2017 at 11:58
  • Well, nowadays I'm getting 3 files, one for each sensor and joining them on python. Also, I tried something. I'll post the codes. Commented Nov 10, 2017 at 12:03

2 Answers 2

2

GNU awk solution:

awk 'BEGIN{ PROCINFO["sorted_in"]="@ind_str_asc"; }
     NR > 1{ a[$1"\t"$2][$7]=$6 }
     END{ 
         printf "Date\tTime\t-1\t-2\t-3\n"; 
         for (i in a) print i,a[i][-1],a[i][-2],a[i][-3] 
     }' file | column -t
  • a[$1"\t"$2][$7]=$6 :
    • a - multidimensional array
    • $1"\t"$2 - array key/index constructed by concatenation of Date/Time fields
    • [$7] - inner array indexed with Parameter ID field value
    • $6 - crucial value from Temp field

The output:

Date        Time      -1      -2      -3
1988-05-16  12:00:00  27.060  9.0300  1.2000
1988-05-17  12:00:00  27.100  9.0200  1.2300
1988-05-18  12:00:00  27.190  9.0400  1.2200
Sign up to request clarification or add additional context in comments.

5 Comments

I got an error when I run this on ubuntu 17.04: awk: line 2: syntax error at or near [ awk: line 5: syntax error at or near [ Any clue?
@rvbarreto, check if you copy/pasted properly. It works fine, see the screenshot ibb.co/jKqVLw
I may be letting something behind. Just did the same and got the error, https://ibb.co/cuNS0w
@rvbarreto, My first sentence clearly says GNU awk. That's the main condition.
Ok, I found the problem. For people using awk or gawk before v4, true multiarrays are not supported. In this case, you have to use associative arrays: awk 'BEGIN { PROCINFO["sorted_in"]="@ind_str_asc"; } NR > 1 {a[$1"\t"$2,$7]=$6; if($NF==-1){b[$1"\t"$2]=$6}} END{printf "Date\tTime\t-1\t-2\t-3\n"; for (i in b){print i, a[i,-1],a[i,-2],a[i,-3]}}'
1

Here is my attempt:

awk 'BEGIN{print "Date\t","Time\t", "-1\t", "-2\t", "-3\t";
    PROCINFO["sorted_in"] = "@ind_str_asc" }
    NR > 1{ a[$1] = a[$1]" " $6 ; b[$1] = $2;next }
    END{ for ( i in a ) print i, b[i], a[i] }' file1 | column -t

Output:

Date        Time      -1      -2      -3
1988-05-16  12:00:00  27.060  9.0300  1.2000
1988-05-17  12:00:00  27.100  9.0200  1.2300
1988-05-18  12:00:00  27.190  9.0400  1.2200

As Roman has everything in one array I splittet the task up to two arrays.

1 Comment

Nice! It's similar to the code I posted on Roman's solution, but much more elegant! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.