I want to concatenate two or more files depending if there names contain or not elements from an array.
I am reading this kind of file line by line (proteome.pisa):
2PJY_p chain=(B C) hresname=() hresnumber=() hatom=() model=() altconf=()
2Q7N_p chain=(A E F G H I J K L) hresname=(FUC MAN NAG) hresnumber=() hatom=() model=() altconf=()
For each line, the script extracts the string on the first column and defines it as the variable pdbid. Then it takes the second column and defines it as an array (chain of elements $c). Then it checks if a file called ${pdbid}_${c}_p.pdb exists and, if it does, it merges its content into the file ${pdbid}_p_${chains}.pdb
This is the script:
while read line ; do
echo "$line" > pdb.line
cut -f1 pdb.line > pdb.list
sed -i 's/.*/\"&\"/' pdb.list
sed -i 's/_p//g' pdb.list
awk '{ printf "pdbid="; print }' pdb.list > pdbid.list
cut -f2 pdb.line > chain.list
source pdbid.list
source chain.list
chains=`printf "%s" "${chain[@]}"`
for c in ${chain[@]} ; do
if [ ${#chain[@]} -gt 1 ] && \
[ -f ${pdbid}_${c}_p.pdb ] ; then
cat ${pdbid}_${chain[$c]}_p.pdb >> ${pdbid}_p_${chains}.pdb
fi
done
done < proteome.pisa
The expected behaviour was to merge for instance, for the first row, 2PJY_p_B.pdb and 2PJY_p_C.pdb in a file called 2PJY_p_BC.pdb. However, what it actually does is merging the first file twice. I cannot understand why...
set -vxto help debug the values of your variables? Good luck.