10

I've have to scripts:

#!/bin/bash

netcat -lk -p 12345 | while read line
do
    match=$(echo $line | grep -c 'Keep-Alive')
    if [ $match -eq 1 ]; then
        [start a command]
    fi
done

and

#!/bin/bash

netcat -lk -p 12346 | while read line
do
    match=$(echo $line | grep -c 'Keep-Alive')
    if [ $match -eq 1 ]; then
        [start a command]
    fi
done

I've put the two scripts in the '/etc/init.d/'

When I restart my Linux machine (RasbPi), both the scripts work fine.

I've tried them like 20 times, and they keep working fine.

But after around 12 hours, the whole system stops working. I've put in some loggin, but it seems that the scripts are not reacting anymore. But when I;

ps aux

I can see that the scripts are still running:

root      1686  0.0  0.2   2740  1184 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script1.sh start
root      1689  0.0  0.1   2268   512 ?        S    Aug12   0:00 netcat -lk 12345
root      1690  0.0  0.1   2744   784 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script1.sh start
root      1691  0.0  0.2   2740  1184 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script2.sh start
root      1694  0.0  0.1   2268   512 ?        S    Aug12   0:00 netcat -lk 12346
root      1695  0.0  0.1   2744   784 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script2.sh start

After a reboot they start working again... But thats a sin, rebooting a Linux machine periodically...

I've inserted some loggin, here's the outcome;

Listening on [0.0.0.0] (family 0, port 12345)
[2013-08-14 11:55:00] Starting loop.
[2013-08-14 11:55:00] Starting netcat.
netcat: Address already in use
[2013-08-14 11:55:00] Netcat has stopped or crashed.
[2013-08-14 11:49:52] Starting loop.
[2013-08-14 11:49:52] Starting netcat.
Listening on [0.0.0.0] (family 0, port 12345)
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6333)
Connection closed, listening again.
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6334)
[2013-08-14 12:40:02] Starting loop.
[2013-08-14 12:40:02] Starting netcat.
netcat: Address already in use
[2013-08-14 12:40:02] Netcat has stopped or crashed.
[2013-08-14 12:17:16] Starting loop.
[2013-08-14 12:17:16] Starting netcat.
Listening on [0.0.0.0] (family 0, port 12345)
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6387)
Connection closed, listening again.
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6388)
[2013-08-14 13:10:08] Starting loop.
[2013-08-14 13:10:08] Starting netcat.
netcat: Address already in use
[2013-08-14 13:10:08] Netcat has stopped or crashed.
[2013-08-14 12:17:16] Starting loop.
[2013-08-14 12:17:16] Starting netcat.
Listening on [0.0.0.0] (family 0, port 12345)
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6167)
Connection closed, listening again.
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6168)

Thanks

12
  • can't see an issue, but I don't know much about netcat. BUT you can reduce the number of processes your creating by replacing match=...fi with do ; if grep -q 'Keep-Alive' ; then start cmd; fi. Good luck. Commented Aug 7, 2013 at 14:05
  • I've just tried that, but that stops everything from working... Commented Aug 7, 2013 at 14:20
  • 3
    +1 "But that's a sin". I suspect, especially in light of the -k keep-alive flag on netcat, that the IP layer is bouncing after many hours either through DHCP lease expiration or "self-healing (i.e. reboot daily because it's easier than fixing bugs)" features of your etherswitch. Does /var/log/syslog give you any clue? Commented Aug 7, 2013 at 14:25
  • Sorry that didn't work for you. As tech support is so fond of say "It works for me" ;-> If you want to add an edit to your post showing the exact code, I'll be happy to look at that. ...... I agree with msw, especially about looking in /var/log/syslog (as I don't know that much about netcat). Good luck! Commented Aug 7, 2013 at 14:43
  • 1
    I wonder if placing it on a loop that that sleeps about 4s before restarting netcat would be a good workaround. But of course it's still important that you know the real cause of it. Probably it's not really related to netcat but the interface itself or outside connections. Commented Aug 7, 2013 at 14:57

6 Answers 6

6

If none of your commands including netcat reads input from stdin you can completely make it run independent of the terminal. Sometimes background process that are still dependent on the terminal pauses (S) when they try to read input from it on a background. Actually since you're running a daemon, you should make sure that none of your commands reads input from it (terminal).

#!/bin/bash

set +o monitor # Make sure job control is disabled.

(
    : # Make sure the shell runs a subshell.
    exec netcat -lk -p 12345 | while read line  ## Use exec to overwrite the subshell.
    do
        match=$(echo $line | grep -c 'Keep-Alive')
        if [ $match -eq 1 ]; then
            [start a command]
        fi
    done
) <&- >&- 2>&- </dev/null &>/dev/null &

TASKPID=$!
sleep 1s ## Let the task initialize a bit before we disown it.
disown "$TASKPID"

And I think we could try the logging thing again:

set +o monitor

(
    echo "[$(date "+%F %T")] Starting loop with PID $BASHPID."

    for (( ;; ))
    do
        echo "[$(date "+%F %T")] Starting netcat."

        netcat -vv -lk -p 12345 | while read line
        do
            match=$(echo "$line" | grep -c 'Keep-Alive')
            if [ "$match" -eq 1 ]; then
                [start a command]
            fi
        done

        echo "[$(date "+%F %T")] Netcat has stopped or crashed."

        sleep 4s
    done
) <&- >&- 2>&- </dev/null >> "/var/log/something.log" 2>&1 &

TASKPID=$!
sleep 1s
disown "$TASKPID"
Sign up to request clarification or add additional context in comments.

7 Comments

So I should try the second one?, with the loggin, but should that not also include the 'exec' to run in a subshell?
Honestly I'm not sure if logging would again cause the scripts to stop so perhaps I could suggest that you try the one with logging first then try the one without it after. About exec it won't be a problem don't worry since () would already be separated from its parent shell as a whole itself, and hopefully from the attributes of the terminal as well. If it still doesn't work I would suggest using a different netcat like the original netcat or the gnu-netcat otherwise.
Okey, I'll try the second one, with loggin. Results tomorrow :)
Results; it stops working after aprox one hour; last line in the log: Connection from x.x.x.x port 12345 [tcp/*] accepted (family2, sport 6386)
It didn't say "Netcat has stopped or crashed."?
|
5
+25

About the loop it could look like this.

#!/bin/bash

for (( ;; ))
do
    netcat -lk -p 12345 | while read line
    do
        match=$(echo "$line" | grep -c 'Keep-Alive')
        if [ "$match" -eq 1 ]; then
            [start a command]
        fi
    done
    sleep 4s
done

with added double quotes to keep it safer.

And you could try capturing errors and add some logging with this format:

#!/bin/bash

{
    echo "[$(date "+%F %T")] Starting loop."

    for (( ;; ))
    do
        echo "[$(date "+%F %T")] Starting netcat."

        netcat -lk -p 12345 | while read line
        do
            match=$(echo "$line" | grep -c 'Keep-Alive')
            if [ "$match" -eq 1 ]; then
                [start a command]
            fi
        done

        echo "[$(date "+%F %T")] Netcat has stopped or crashed."

        sleep 4s
    done
} >> "/var/log/something.log" 2>&1

Your read command could also be better in this format since it would read lines unmodified:

... | while IFS= read -r line

Some could also suggest the use of process substitution but I don't recommend it this time since through the | while ... method the while loop would be able to run on a subshell and keep the outer for loop safe just in case it crashes. Besides there isn't really a variable from the while loop that would be needed outside of it.

I'm actually having the idea now that the issue might actually have been related to the input and how the while read line; do ...; done block handles it and not netcat itself. Your variables not being quoted properly around "" could be one of it, or could probably be the actual reason why your netcat is crashing.

6 Comments

Good stuff!, added some logging, and a extra loop if netcat stops... I'm gonna try this right now, if this works, i'll give you the points! Thanks!
If you're not running it as root and if your user has no write permission on the write directory, perhaps you could just use the home directory: } >> ~/something.log 2>&1, or create the file with write permission for the user as root: touch /var/log/something.log; chown youruser:yourusersgroup /var/log/something.log; chmod 644 /var/log/something.log # or 600 at your preference
I'm running as root, so its all fine. I've also added a line of log code when the event is triggerd. Really think we are on the right direction here. We can now pinpoint in what fase it's crashing. I've also found out, that during boot the script gives an error 3 times, and restartes 3 times. Thanks for helping me with the syntax, if it were powershell, I would be no problem for me, but i'm not that good with Linux...
I remember. You could also add more messages with netcat's -v option. It only sends verbose messages to stderr (fd 2) and not to the pipe so it won't affect the process. netcat -vv -lk -p 12345 | while IFS= read -r line
root 1686 0.0 0.2 2740 1184 ? S Aug12 0:00 /bin/bash /etc/init.d/script1.sh start root 1689 0.0 0.1 2268 512 ? S Aug12 0:00 netcat -lk 12345 root 1690 0.0 0.1 2744 784 ? S Aug12 0:00 /bin/bash /etc/init.d/script1.sh start root 1691 0.0 0.2 2740 1184 ? S Aug12 0:00 /bin/bash /etc/init.d/script2.sh start root 1694 0.0 0.1 2268 512 ? S Aug12 0:00 netcat -lk 12346 root 1695 0.0 0.1 2744 784 ? S Aug12 0:00 /bin/bash /etc/init.d/script2.sh start
|
3

You mentioned "after around 12 hours, the whole system stops working" - It is likely that the scripts are executing whatever you have in [start a command] and is bloating the memory. Are you sure the [start a command] is not forking out many processes very frequently and releasing memory?

4 Comments

good point, to rule this out, I'll have to remove the command and echo to a log file. To see if stays working without my starting script.
So you are saying you have removed the [start a command] part and still your script don't respond after 12 hours?
Yup, I've also tried it with loggin. Log entry before starting the command, and an otherone when it comes back.
What I was saying is remove the command totally, you are using in [start a command]. That way you will know if using the command is bloating your system.
3

I have often experienced strange behaviour with nc or netcat. You should have a look at ncat it's almost the same tool but it behaves the same on all platforms (nc and netcat behave differently depending on distri, linux, BSD, Mac).

1 Comment

Got some strange behavior there myself. Tnx for rec on substitute to try.
2

Periodically netcat will print, not a line, but a block of binary data. The read builtin will likely fail as a result.

I think you're using this program to verify that a remote host is still connected to port 12345 and 12346 and hasn't been rebooted.

My solution for you is to pipe the output of netcat to sed, then pipe that (much reduced) line to the read builtin...

#!/bin/bash

{
    echo "[$(date "+%F %T")] Starting loop."

    for (( ;; ))
    do
        echo "[$(date "+%F %T")] Starting netcat."

        netcat -lk -p 12345 | sed 's/.*Keep-Alive.*/Keep-Alive/g' | \
        \
        while read line
        do
            match=$(echo "$line" | grep -c 'Keep-Alive')
            if [ "$match" -eq 1 ]; then
                [start a command]
            fi
        done

        echo "[$(date "+%F %T")] Netcat has stopped or crashed."

        sleep 4s
    done
} >> "/var/log/something.log" 2>&1

Also, you'll need to review some of the other startup programs in /etc/init.d to make sure they are compatible with whatever version of rc the system uses, though, it would be much easier to call your script2.sh from a copy of some simple file in init.d. As it stands script2 is the startup script but doesn't conform to the init package you use.

That sounds more complicated that I mean... Let me explain better:

/etc/init.d/syslogd        ## a standard init script that calls syslogd
/etc/init.d/start-monitor   ## a copy of a standard init script that calls script2.sh

As an additional note, I think you could bind netcat to the specific IP that you are monitoring, instead of binding it to the all address 0.0.0.0

1 Comment

ksh has a read -r (raw), and maybe there is a -b binary too. Not sure about bash. Good luck to all.
1

you may not use the -p option in the case you will wait for an incoming connect request. (see man page of nc) Hostname and Port are the last two arguments of the command line.

May be it connects to the own port and after some hours there is some resource missing??

2 Comments

Good point, but I also noticed that when reading the man pages of netcat. I've removed the -p option. But thats not helping. What do you mean with your second remark?
That's only a sneakuing suspicion. I never used netcat in this way. The -p Option is an Option that is used for outgoing connection requests. The vague suspicion is: netcat could try to initiate some request due to that parameter setting. But you say that it's no difference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.