I have a reference file:
Refrence File
Dpse\GA30012 FBgn0000447 chr2 26607738 26607962 -1
Dpse\GA19764 FBgn0085819 chrX 28571020 28571736 -1
Dpse\ttk FBgn0000100 chr2 16553824 16561652 -1
Dpse\GA30195 FBgn0085742 chr3 22629640 22630440 -1
and a input file:
file
FBgn0000447 1 11 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957 255 -
FBgn0000100 1 11 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803 255 -
FBgn0085819 1 11 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0085742 1 11 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0037963 47752 47802 HWI-ST1083:68:C0YYUACXX:8:1215:21263:59372 255 -
FBgn0001257 11527 11577 HWI-ST1083:68:C0YYUACXX:8:1311:2957:12154 255 -
FBgn0034315 158 208 HWI-ST1083:68:C0YYUACXX:8:2113:4139:83177 255 -
FBgn0000559 3316 3365 HWI-ST484:183:C167BACXX:7:1101:1926:2031 255 +
FBgn0262975 39033 39082 HWI-ST484:183:C167BACXX:7:1101:1726:2030 255 +
FBgn0032505 1 50 HWI-ST484:183:C167BACXX:7:1101:5095:2042 255 +
FBgn0005593 403 452 HWI-ST484:183:C167BACXX:7:1101:3906:2209 255 +
FBgn0013686 692 741 HWI-ST484:183:C167BACXX:7:1101:3218:2247 255 -
FBgn0000556 3793 3842 HWI-ST484:183:C167BACXX:7:1101:5288:2041 255 +
FBgn0015521 438 487 HWI-ST484:183:C167BACXX:7:1101:5731:2170 255 -
FBgn0033912 1121 1170 HWI-ST484:183:C167BACXX:7:1101:8602:2063 255 -
I created an empty column between the 1st and 2nd column, the file become this output2:
Output2
FBgn0000447 435 485 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957 255 -
FBgn0000100 704 754 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803 255 -
FBgn0085819 154 204 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0085742 389 439 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0037963 47752 47802 HWI-ST1083:68:C0YYUACXX:8:1215:21263:59372 255 -
FBgn0001257 11527 11577 HWI-ST1083:68:C0YYUACXX:8:1311:2957:12154 255 -
FBgn0034315 158 208 HWI-ST1083:68:C0YYUACXX:8:2113:4139:83177 255 -
FBgn0000559 3316 3365 HWI-ST484:183:C167BACXX:7:1101:1926:2031 255 +
FBgn0262975 39033 39082 HWI-ST484:183:C167BACXX:7:1101:1726:2030 255 +
FBgn0032505 1 50 HWI-ST484:183:C167BACXX:7:1101:5095:2042 255 +
FBgn0005593 403 452 HWI-ST484:183:C167BACXX:7:1101:3906:2209 255 +
FBgn0013686 692 741 HWI-ST484:183:C167BACXX:7:1101:3218:2247 255 -
FBgn0000556 3793 3842 HWI-ST484:183:C167BACXX:7:1101:5288:2041 255 +
FBgn0015521 438 487 HWI-ST484:183:C167BACXX:7:1101:5731:2170 255 -
FBgn0033912 1121 1170 HWI-ST484:183:C167BACXX:7:1101:8602:2063 255 -
Here is the ideal output:
For each id in column 1 in the output2 file, for the corresponding id in column 2 in the reference file, fill in the output2 file column 2 with the value of reference column 3. For each id in column 1 in the output2 file, for the corresponding id in column 2 in the reference file, output2 file column 3 will be equal to (column 3 + reference 4 - 1) (as a calculation result) and column 4 will be equal to (column 4 + reference 4 - 1).
This my current code and I cannot get my ideal output file:
Current code
awk -v OFS="\t" '
NR==FNR {a[$2]=$3; b[$2]=$4; next};
{if ($1 in a) $2=a[$1]; print};
{if ($1 in b) $3=b[$1]+$3-1; $4=b[$1]+$4-1; print}
' $ref $output2 > $output3
Ideal ouput should look like(for the first 4 rows):
Output (Desired)
FBgn0000447 chr2 26607738 26607748 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957 255 -
FBgn0000100 chr2 28571020 28571030 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803 255 -
FBgn0085819 chrX 16553824 16553834 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0085742 chr3 22629640 22629650 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
Not sure this is due to there is some restriction of the numeric value in awk array or something else is wrong. Thanks a lot for help!
P.S. I remember one problem, for the reference file not all the ids in column 2 have corresponding values in column 3/4. So is this is why I cannot get values in output2, how should I solve this? fill in what with the empty space is the best?
Thanks again
ifs on the other side of the braces?awkcode does not match your English description. According to your English description of your desired output, you should haveif ($1 in a) $2 = a[$1]; printand for the second partif ($1 in b) $3 = b[$1] + $3 - 1; $4 = b[$1] + $4 - 1; print.