awk,creating columns for different values

Question

This is my imput txt file

2013121612,HCDC,0
2013121613,HCDC,84
2013121614,HCDC,100
2013121615,HCDC,98
2013121612,MSLP,1023.83
2013121613,MSLP,1023.02
2013121614,MSLP,1022.08
2013121615,MSLP,1021.61
2013121612,MAXT,12.723
2013121613,MAXT,13.412
2013121614,MAXT,13.41
2013121615,MAXT,12.482

this is my BAD or INSUFFICIENT code

awk -F"," '/MAXT|HCDC|MSLP/ {print $1,"\t",$3,"\t",$3,"\t",$3}' input.txt >> ouput.txt

and this is de output file

DATE    MAXT    HCDC    MSLP    
2013121612   0   0   0
2013121613   84      84      84
2013121614   100     100     100
2013121615   98      98      98
2013121612   1023.03     1023.03     1023.03
2013121613   1023.02     1023.02     1023.02
2013121614   1022.08     1022.08     1022.08
2013121615   1020.84     1020.84     1020.84
2013121612   12.723      12.723      12.723
2013121613   13.412      13.412      13.412
2013121614   13.41           13.41       13.41
2013121615   12.482      12.482      12.482

What I need is this output format…

DATE    MAXT    HCDC    MSLP  
2013121612   12.723       0      1023.03
2013121613   13.412          84      1023.02
2013121614   13.41          100      1022.08
2013121615   12.482      98      1020.84

I am forced to ask for help because my knowledge of unix are very few

thank you very much

glenn jackman · Accepted Answer · 2013-12-20 18:53:17Z

2

Here's awk:

awk -F, '
    {
        key[$1] = 1
        data[$1,$2] = $3
    } 
    END {
        print "DATE","MAXT","HCDC","MSLP"
        for (k in key)
            print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"]
    }
' input.txt | column -t

DATE        MAXT    HCDC  MSLP
2013121612  12.723  0     1023.83
2013121613  13.412  84    1023.02
2013121614  13.41   100   1022.08
2013121615  12.482  98    1021.61

Because I'm using associative arrays, the order of the keys is not guaranteed. If you need to sort the output, so something like this bash code:

{
    echo DATE MAXT HCDC MSLP
    awk -F, '
        { key[$1] = 1; data[$1,$2] = $3 }
        END { for (k in key) print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"] }
    ' input.txt | sort
} | column -t

answered Dec 20, 2013 at 18:53

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

raposu Over a year ago

MSLP go right now but the date is not working properly, why? unknown because the bash that MSLP has two values for the same date as follows: 2013121905, MSLP, 1017.14 2013121905, MSLP, 1016.29 sorry …. date now carries with it its value, but messily

glenn jackman Over a year ago

and you want to output two rows for the different data but the same date?

raposu Over a year ago

the script of @1_CR working perfeft .. distinguishes pairs of odd for MSLP, MSLP memory has two values for the same date and it only takes one it is perfect. You have modified the code? I initially looked like it worked, now sets the date randomly

glenn jackman Over a year ago

I explained that the order is not guaranteed to be the same, so I showd you how to sort it.

BMW Over a year ago

key[$1] = 1 is not required, key[$1] is enough.

|

iruvar · Accepted Answer · 2013-12-20 18:55:26Z

1

awk -F, '!($1 in seen){dr[++i]=$1};{d=$1; v=$3; $0=$2; seen[d]++};
    /HCDC/{HCDC[d]=v}; /MSLP/{MSLP[d]=v};/MAXT/{MAXT[d]=v};
    END{print "DATE", "MAXT", "HCDC", "MSLP"; 
        for (j=1; j<=i; ++j) {print dr[j], (dr[j] in MAXT)? MAXT[dr[j]]: 0,
                                 (dr[j] in HCDC)? HCDC[dr[j]]: 0,
                                 (dr[j] in MSLP)? MSLP[dr[j]]: 0}}' input.txt

DATE MAXT HCDC MSLP
2013121612 12.723 0 1023.83
2013121613 13.412 84 1023.02
2013121614 13.41 100 1022.08
2013121615 12.482 98 1021.61

answered Dec 20, 2013 at 18:55

iruvar

23.5k7 gold badges58 silver badges83 bronze badges

3 Comments

raposu Over a year ago

Works perfectly in all cases awk, I am grateful to all for the quick response Abusing a bit: It is this same single bash have only odd or even for MSLP (only one, because I have two MSLP hourly values, and I need to delete one, I think he do myself with a different bash) thanks everyone

raposu Over a year ago

this works perfectly .. distinguishes pairs of odd for MSLP MSLP memory has two values for the same date and it only takes one it is perfect

BMW Over a year ago

dr[j] in HCDC)? HCDC[dr[j]]: 0 can be replaced by: dr[j]+0 directly

Alexander L. Belikoff · Accepted Answer · 2013-12-20 19:05:08Z

1

You are basically trying to pivot the table, reshaping it using two columns. You can use a specialized language for that (R is very good at such tasks). awk is not the best language for such jobs (although it is surely possible to do using it). I'd recommend rewriting it in Python, which might be a bit easier. The outline (no error checking and such) of the code is below:

tbl = {}       # map date to a dict of colname->values

# ingest the data

for line in myfile:
    rec = line.split()

    if rec[0] not in tbl:
        tbl[rec[0]] = {}

    tbl[rec[0]][rec[1]] = double(rec[2])

# output the table

for date in tbl:
    print date, tbl[date]['MAXT'], tbl[date]['HCDC'], tbl[date]['MSLP']

Note that it might be even easier (practically a two-liner) using NumPy but I'm not sure it is worth making this a dependency for such a small task.

edited Dec 20, 2013 at 19:05

answered Dec 20, 2013 at 18:23

Alexander L. Belikoff

5,7331 gold badge27 silver badges34 bronze badges

2 Comments

raposu Over a year ago

sorry File "output.py", line 10 if rec[0] is not in tbl:

Alexander L. Belikoff Over a year ago

Sorry, fixed the typo

Collectives™ on Stack Overflow

awk,creating columns for different values

3 Answers 3

6 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related