1

I have file like this (Tab Separated)

Name  Data1  Data2  Extra  A   B   C   D  
Test1   A     C       40   23  10  12  5  
Test2   B     C       20   13  3   32  5  
Test3   C     D       44   43  0   1   5  
Test4   A     D       43   2   7   0   5  

I need add column called frequency based on this Data1 and Data2. Freq= Data2 value/ (Data2 value+ Data1 value). For example for the Test1 Freq = 12/(12+23)

It will be easy to calculate and add values like this (for the row where Data1="A" and Date2="C"

  awk '{print$7/($5+$7)}‘

But How can I select the column based on the row value ?

Expected out

Name  Data1  Data2  Extra  A   B   C   D  Freq
Test1   A     C       40   23  10  12  5  0.34
Test2   B     C       20   13  3   32  5  0.91
Test3   C     D       44   43  0   1   5  0.83
Test4   A     D       43   2   7   0   5  0.71
2
  • 1
    "But I how can Select the column based on the row value ?" ??? Commented Mar 4, 2022 at 13:40
  • I mean If I can say column name == "A" based on the row value , it will be easier to call the value Commented Mar 4, 2022 at 13:43

3 Answers 3

3
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==1 {
    $(NF+1) = "Freq"
    for (i=1; i<=NF; i++) {
        f[$i] = i
    }
    print
    next
}
{
    d1 = $(f["Data1"])
    d2 = $(f["Data2"])
    numer = $(f[d2])
    denom = numer + $(f[d1])
    $(f["Freq"]) = sprintf( "%.02f", (denom ? numer / denom : 0) )
    print
}

$ awk -f tst.awk file
Name    Data1   Data2   Extra   A       B       C       D       Freq
Test1   A       C       40      23      10      12      5       0.34
Test2   B       C       20      13      3       32      5       0.91
Test3   C       D       44      43      0       1       5       0.83
Test4   A       D       43      2       7       0       5       0.71
Sign up to request clarification or add additional context in comments.

Comments

2

To get a copy based on the values of Data1:

gawk '{ 
   s=($2=="A"?5:0)+($2=="B"?6:0)+($2=="C"?7:0)+($2=="D"?8:0); 
   print $0,s,(s!=0?$s:"") '}   inputfile

With you sample input this gives:

Name  Data1  Data2  Extra  A   B   C   D 0
Test1   A     C       40   23  10  12  5 5 23
Test2   B     C       20   13  3   32  5 6 3
Test3   C     D       44   43  0   0   5 7 0
Test4   A     D       43   0   7   0   5 5 0

The value of s refers to the column, so $s gives the value for that column.

BTW: I am using gawk, but this should work in awk too.

Comments

1

Something like this might work for you, I have written it a bit verbose, to emphasize on what is going on:

$ cat freq_from_col.awk 
function indirect(val) {
        if (val == "A")
                return $col_a
        if (val == "B")
                return $col_b
        if (val == "C")
                return $col_c
        if (val == "D")
                return $col_d

        return 0
}
BEGIN {
        col_name = 1
        col_data1 = 2
        col_data2 = 3
        col_extra = 4
        col_a = 5
        col_b = 6
        col_c = 7
        col_d = 8
}
NR == 1 {
        print $0, "Freq"
        next;
}
{
        n = indirect($col_data1);
        m = indirect($col_data2);

        print $0, sprintf("%.2f", m/(n+m));
}
$ awk -f freq_from_col.awk data.txt
Name  Data1  Data2  Extra  A   B   C   D Freq
Test1   A     C       40   23  10  12  5 0.34
Test2   B     C       20   13  3   32  5 0.91
Test3   C     D       44   43  0   1   5 0.83
Test4   A     D       43   2   7   0   5 0.71

1 Comment

That would fail with a divide-by-zero error if Data1 and Data2 were both 0.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.