2

This is my imput txt file

2013121612,HCDC,0
2013121613,HCDC,84
2013121614,HCDC,100
2013121615,HCDC,98
2013121612,MSLP,1023.83
2013121613,MSLP,1023.02
2013121614,MSLP,1022.08
2013121615,MSLP,1021.61
2013121612,MAXT,12.723
2013121613,MAXT,13.412
2013121614,MAXT,13.41
2013121615,MAXT,12.482

this is my BAD or INSUFFICIENT code

awk -F"," '/MAXT|HCDC|MSLP/ {print $1,"\t",$3,"\t",$3,"\t",$3}' input.txt >> ouput.txt

and this is de output file

DATE    MAXT    HCDC    MSLP    
2013121612   0   0   0
2013121613   84      84      84
2013121614   100     100     100
2013121615   98      98      98
2013121612   1023.03     1023.03     1023.03
2013121613   1023.02     1023.02     1023.02
2013121614   1022.08     1022.08     1022.08
2013121615   1020.84     1020.84     1020.84
2013121612   12.723      12.723      12.723
2013121613   13.412      13.412      13.412
2013121614   13.41           13.41       13.41
2013121615   12.482      12.482      12.482

What I need is this output format…

DATE    MAXT    HCDC    MSLP  
2013121612   12.723       0      1023.03
2013121613   13.412          84      1023.02
2013121614   13.41          100      1022.08
2013121615   12.482      98      1020.84

I am forced to ask for help because my knowledge of unix are very few

thank you very much

3 Answers 3

2

Here's awk:

awk -F, '
    {
        key[$1] = 1
        data[$1,$2] = $3
    } 
    END {
        print "DATE","MAXT","HCDC","MSLP"
        for (k in key)
            print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"]
    }
' input.txt | column -t
DATE        MAXT    HCDC  MSLP
2013121612  12.723  0     1023.83
2013121613  13.412  84    1023.02
2013121614  13.41   100   1022.08
2013121615  12.482  98    1021.61

Because I'm using associative arrays, the order of the keys is not guaranteed. If you need to sort the output, so something like this bash code:

{
    echo DATE MAXT HCDC MSLP
    awk -F, '
        { key[$1] = 1; data[$1,$2] = $3 }
        END { for (k in key) print k, data[k,"MAXT"], data[k,"HCDC"], data[k,"MSLP"] }
    ' input.txt | sort
} | column -t
Sign up to request clarification or add additional context in comments.

6 Comments

MSLP go right now but the date is not working properly, why? unknown because the bash that MSLP has two values ​​for the same date as follows: 2013121905, MSLP, 1017.14 2013121905, MSLP, 1016.29 sorry …. date now carries with it its value, but messily
and you want to output two rows for the different data but the same date?
the script of @1_CR working perfeft .. distinguishes pairs of odd for MSLP, MSLP memory has two values ​​for the same date and it only takes one it is perfect. You have modified the code? I initially looked like it worked, now sets the date randomly
I explained that the order is not guaranteed to be the same, so I showd you how to sort it.
key[$1] = 1 is not required, key[$1] is enough.
|
1
awk -F, '!($1 in seen){dr[++i]=$1};{d=$1; v=$3; $0=$2; seen[d]++};
    /HCDC/{HCDC[d]=v}; /MSLP/{MSLP[d]=v};/MAXT/{MAXT[d]=v};
    END{print "DATE", "MAXT", "HCDC", "MSLP"; 
        for (j=1; j<=i; ++j) {print dr[j], (dr[j] in MAXT)? MAXT[dr[j]]: 0,
                                 (dr[j] in HCDC)? HCDC[dr[j]]: 0,
                                 (dr[j] in MSLP)? MSLP[dr[j]]: 0}}' input.txt

DATE MAXT HCDC MSLP
2013121612 12.723 0 1023.83
2013121613 13.412 84 1023.02
2013121614 13.41 100 1022.08
2013121615 12.482 98 1021.61

3 Comments

Works perfectly in all cases awk, I am grateful to all for the quick response Abusing a bit: It is this same single bash have only odd or even for MSLP (only one, because I have two MSLP hourly values​​, and I need to delete one, I think he do myself with a different bash) thanks everyone
this works perfectly .. distinguishes pairs of odd for MSLP MSLP memory has two values ​​for the same date and it only takes one it is perfect
dr[j] in HCDC)? HCDC[dr[j]]: 0 can be replaced by: dr[j]+0 directly
1

You are basically trying to pivot the table, reshaping it using two columns. You can use a specialized language for that (R is very good at such tasks). awk is not the best language for such jobs (although it is surely possible to do using it). I'd recommend rewriting it in Python, which might be a bit easier. The outline (no error checking and such) of the code is below:

tbl = {}       # map date to a dict of colname->values

# ingest the data

for line in myfile:
    rec = line.split()

    if rec[0] not in tbl:
        tbl[rec[0]] = {}

    tbl[rec[0]][rec[1]] = double(rec[2])

# output the table

for date in tbl:
    print date, tbl[date]['MAXT'], tbl[date]['HCDC'], tbl[date]['MSLP']

Note that it might be even easier (practically a two-liner) using NumPy but I'm not sure it is worth making this a dependency for such a small task.

2 Comments

sorry File "output.py", line 10 if rec[0] is not in tbl:
Sorry, fixed the typo

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.