1

How can we used awk/sed/unix command to massage the data. I have a data as below:

/abc/def1.0/Acc101 500 50
/abc/def1.0/Acc101 401 27
/abc/def1.0/Acc101 200 101
/abc/def1.0/Acc201 200 4
/abc/def1.0/Acc301 304 2
/abc/def1.0/Acc401 200 204

For each unique string in the first column $1 how can we merge the value which is separated by value. Column $2 is the code if its 200 it means success other then that its Failed. $3 is the count of the occurrence.

Below are the sample output as we distinct $1 and validate value which has either value 200 or not 200 in $2 and merge/sum the count in $3. Sample as below:

/abc/def1.0/Acc101 101 77
/abc/def1.0/Acc201 4 0
/abc/def1.0/Acc301 0 2
/abc/def1.0/Acc401 204 0

Info for the line: /abc/def1.0/Acc101 101 77

77 = sum of 50+27 from $3 which the value of $2 != 200

Many thanks for the help.

2 Answers 2

1

Something like

awk '{ groups[$1] = 1; if ($2 == 200) succ[$1] += $3; else fail[$1] += $3 }
     END { PROCINFO["sorted_in"] = "@ind_str_asc"
           for (g in groups) print g, succ[g]+0, fail[g]+0 }' input.txt
/abc/def1.0/Acc101 101 77
/abc/def1.0/Acc201 4 0
/abc/def1.0/Acc301 0 2
/abc/def1.0/Acc401 204 0

If using GNU awk, the PROCINFO line will result in sorted output, otherwise the order is arbitrary and if you want it sorted, can be piped to sort.

Sign up to request clarification or add additional context in comments.

Comments

0

You could read the Input_file 2 times for easiness and could try following once.

awk '
FNR==NR{
  mainarray[$1]
  if($2!=200){
    sum[$1]+=$NF
  }
  if($2==200){
    Found200[$1]+=$NF
  }
  next
}
($1 in mainarray) && !($1 in Found200){
  print $1,0,sum[$1]!=""?sum[$1]:0
  next
}
$2==200{
  print $1,Found200[$1]!=""?Found200[$1]:0,sum[$1]!=""?sum[$1]:0
}
'  Input_file  Input_file

Explanation: Adding detailed information for above.

awk '                                                           ##Starting awk program from here.
FNR==NR{                                                        ##FNR==NR condition will be TRUE when first time Input_file will be read.
  mainarray[$1]                                                 ##Creating array with index $1 here.
  if($2!=200){                                                  ##Creating array named sumwith index $1 and keep adding last column value in it.
    sum[$1]+=$NF                                                ##Creating array named sumwith index $1 and keep adding last column value in it
  }
  if($2==200){                                                  ##Checking condition if 2nd field is equal to 200 then do following.
    Found200[$1]+=$NF                                           ##Creating array Found200 with index #1and keep adding last column value to its value.
  }
  next                                                          ##next will skip all further statements from here.
}
($1 in mainarray) && !($1 in Found200){                         ##Checking condition if $1 is present in mainarray and $1 is NOT present in Found200 array.
  print $1,0,sum[$1]!=""?sum[$1]:0                              ##Printing first field, zero and value of sum with $1 here.
  next                                                          ##next will skip all further statements from here.
}
$2==200{                                                        ##Checking condition if 3rd field is 200 then do following.
  print $1,$NF!=""?Found200[$1]:0,sum[$1]!=""?sum[$1]:0         ##Printing first field, Found200 vaue with sum value.
}
' Input_file  Input_file                                      ##Mentioning Input_file names here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.