I have a set of tab-separated files with gene identifiers in the first column, and each subsequent column represents an individual sample with values for that given gene in column one. Here is an truncated example of one of my files with only a few samples:
DDR1 8.55578403700418 8.65526857898327 8.71701700266541
MIR4640 8.55578403700418 8.65526857898327 8.71701700266541
RFC2 5.47524925570941 5.88644077981836 5.77277342309348
HSPA6 4.12035662689116 4.01089068869244 3.82366440713502
PAX8
GUCA1A
I got some ideas from Awk adding constant values, Bash Script Awk if statements, and AWK if length statement append, Since I have several thousand rows and possibly hundreds of columns depending on the input file, I tried writing my script like this:
cd ../path/to/file
inputFile=inputFile.in
outputFile=outputFile.out
columnCount= $(awk -F"\t" 'NR==1 {print NF}' $inputFile)
awk '{ for (i = 1; i <= $columnCount; i++)
if (i<$columnCount) {print $0"\t?"}' $inputFile > $outputFile
}'
but I keep getting syntax errors.
$ awk -f missingValueAdder.awk
awk: missingValueAdder.awk:3: cd ../path/to/file
awk: missingValueAdder.awk:3: ^ syntax error
awk: missingValueAdder.awk:5: inputFile=inputFile.in
awk: missingValueAdder.awk:5: ^ syntax error
awk: missingValueAdder.awk:6: outputFile=outputFile.out
awk: missingValueAdder.awk:6 ^ syntax error
awk: missingValueAdder.awk:8: columnCount= $(awk -F"\t" 'NR==1 {print NF}' $inputFile)
awk: missingValueAdder.awk:8: ^ invalid char ''' in expression
So I tried this one-liner
awk 'for (i=1;i<=NF;i++) BEGIN{FS=OFS="\t"} I<NF{print$0"\t?"}' inputFile.in > outputFile.out
but I got another syntax error starting at my for loop. Anyways, my output file should look like
DDR1 8.55578403700418 8.65526857898327 8.71701700266541
MIR4640 8.55578403700418 8.65526857898327 8.71701700266541
RFC2 5.47524925570941 5.88644077981836 5.77277342309348
HSPA6 4.12035662689116 4.01089068869244 3.82366440713502
PAX8 ? ? ?
GUCA1A ? ? ?
I want to print as many "?" as dictated by NF (In this case 3, but could be as many as 100). Any help would be most appreciated! Thanks
awkscript.PAX8andGUCA1Aalso the required number of tabs, e.g. in the example three tabs after the gene name?PAX8have no additional tabs after the first column.