I'm using Bash, and I have a directory of .tsv files containing different behavioral data (RT and accuracy) for different subjects and multiple sessions within the same subjects. My goal is to concatenate the RT field (in field 3 of each .tsv file) and the accuracy field (in field 9) across all these files into a single .tsv file, while adding the subject and session (defined based on the directory names) as new variables in this concatenated file every time I append a new file, so I can keep together the subject-session data with the RT and accuracy data.
To illustrate, each .tsv file has the following header in every row:
V1 V2 RT V4 V5 V6 V7 V8 ACC
I want to look through many of these files, extracting just the RT and ACC fields and adding the data in these fields to a new .tsv file with SUB and SES as new variables in a file called "summary.tsv":
SUB SES RT ACC
Here's the code I have so far:
subdir=~/path/to/subdir
for subs in ${subdir}/subject-*; do
sub=$(basename ${subs})
for sess in ${sub}/session-*; do
ses=$(basename ${ses})
for files in ${sess}/*.tsv; do
if [[ -e $files ]] && [[ -e ${outdir}/summary.tsv ]] ; then
awk 'NR > 1 {print $3,$9}' ${files} >> ${outdir}/summary.tsv
fi
if [[ -e $files ]] && [[ ! -e ${outdir}/summary.tsv ]] ; then
awk '{print $3,$9}' ${files} > ${outdir}/summary.tsv
fi
done
done
done
This works fine to concatenate files into the summary.tsv file without repeating each file's header, but what I can't figure out is how to add 2 new variables with the same length as the appended output in the "awk 'NR > 1 {print $3,$9}' ${files} >> ${outdir}/summary.tsv" line, containing the corresponding ${sub} and ${ses} variables in the 1st and 2nd fields.
Any suggestions? Thank you so much in advance.
awkin front of the Awk scripts!ifis unnecessary, you can append to a file which doesn't exist and then the shell will simply create it.