I have a script that ssh's to some servers. Sometimes an unexpected problem causes ssh to hang indefinitely. I want to avoid this by killing ssh if it runs too long.
I'm also using a wrapper function for input redirection. I need to force a tty with the -t flag to make a process on the server happy.
function _redirect {
if [ "$DEBUG" -eq 0 ]; then
$* 1> /dev/null 2>&1
else
$*
fi
return $?
exit
}
SSH_CMD="ssh -t -o BatchMode=yes -l robot"
SERVER="192.168.1.2"
ssh_script=$(cat <<EOF
sudo flock -w 60 -n /path/to/lock -c /path/to/some_golang_binary
EOF
)
_redirect timeout 1m $SSH_CMD $SERVER "($ssh_script)"
The result is a timeout with this message printed:
tcsetattr: Interrupted system call
The expected result is either the output of the remote shell command, or a timeout and proper exit code.
when I type
timeout 1m ssh -t -o BatchMode=yes -o -l robot 192.168.1.2 \
"(sudo sudo flock -w 60 -n /path/to/lock -c /path/to/some_golang_binary)" \
1> /dev/null
I get the expected result.
I suspect these two things:
1)The interaction between GNU timeout and ssh is causing the tcsetattr system call to take a very long time (or hang), then timeout sends a SIGTERM to interrupt it and it prints that message. There is no other output because this call is one of the first things done. I wonder if timeout launches ssh in a child process that cannot have a terminal, then uses its main process to count time and kill its child.
I looked here for the reasons this call can fail.
2) _redirect needs a different one of $@, $*, "$@", "$*" etc. Some bad escaping/param munging breaks the arguments to timeout which causes this tcsetattr error. Trying various combinations of this has not yet solved the problem.