50

I am trying to find the pathname with the most characters in it. There might be better ways to do this. But I would like to know why this problem occurs.

LONGEST_CNT=0
find samples/ | while read line
do
    line_length=$(echo $line | wc -m)
    if [[ $line_length -gt $LONGEST_CNT ]] 
    then
        LONGEST_CNT=$line_length
        LONGEST_STR=$line
    fi
done

echo $LONGEST_CNT : $LONGEST_STR

It somehow always returns:

0 :

If I print the results for debugging inside the while loop the values are correct. So why bash does not make these variables global?

1
  • This is (and was when asked) BashFAQ #24 Commented Mar 11 at 16:12

4 Answers 4

89

When you pipe into a while loop in Bash, it creates a subshell. When the subshell exits, all variables return to their previous values (which may be null or unset). This can be prevented by using process substitution.

LONGEST_CNT=0
while read -r line
do
    line_length=${#line}
    if (( line_length > LONGEST_CNT ))
    then
        LONGEST_CNT=$line_length
        LONGEST_STR=$line
    fi
done < <(find samples/ )    # process substitution

echo $LONGEST_CNT : $LONGEST_STR
Sign up to request clarification or add additional context in comments.

5 Comments

The solution Dennis provided works, but be aware, that it violates POSIX norm. Try to 'set -o posix' and the script will not work!
@Robert: The question is tagged [bash]. It is true that process substitution is not specified by POSIX. However, unless portability to POSIX-only shells is a concern, there's no reason not to use Bash-specific features. By the way, process substitution is supported by ksh and zsh. However, they don't create a subshell that causes the variable to be lost. Also note that Bash 4.2 has an option, shopt -s lastpipe which "runs the last command of a pipeline in the current shell context" (not a subshell) - unless job control is in affect.
I use this, but get syntax error: syntax error near unexpected token done'`
@AswinMurugesh: Most likely, you are missing the do.
return to their previous values ... well, IMHO better formulation would be that the changes happen in a temporary copy of environment (even different PID I think) that gets destroyed as the while loop is over. The original global variable is never touched at all; there's no reason to "return".
21

The "correct" reply is given by Dennis. However, I find the process substitution trick extremely unreadable if the loop contains more than a few lines. When reading a script, I want to see what goes into the pipe before I see how it is processed.

So I usually prefer this trick of encapsulating the while loop in "{}".

LONGEST_CNT=0
find /usr/share/zoneinfo | \
{ while read -r line
    do
        line_length=${#line}
        if (( line_length > LONGEST_CNT ))
        then
            LONGEST_CNT=$line_length
            LONGEST_STR=$line
        fi
    done
    echo $LONGEST_CNT : $LONGEST_STR
}

This makes the variable available after the loop, but of course only while still inside the command grouping, so before the closing brace.

4 Comments

Totally agree about the readability point. +1. Still, it's the way to go.
except that this example does not modify the global variable. if you check LONGEST_CNT after the loop, you will see it's still zero
@Deims0s : yes. That is why the echo statement is inside the "{}". If the variables are needed again later, the process substitution way might be better suited.
This doesn't solve the problem. Readability counts for nothing if the code doesn't achieve its goal.
2

Do what you always (should) do:

  • separate concerns,
  • avoid globals,
  • document your code,
  • be readable,
  • maybe be POSIXy.

(Yes, I added a bit more "good practice" ingredients to the soup than absolutely necessary ;))

So my favorite "knee-jerk reaction" to problems with invisible subshell is to use function:

#!/bin/sh

longest() {
    #
    # Print length and body of the longest line in STDIN
    #
    local cur_ln    # current line
    local cur_sz    # current size (line length)
    local max_sz    # greatest size so far
    local winner    # longest string so far
    max_sz=0
    while read -r cur_ln
    do
        cur_sz=${#cur_ln}
        if test "$cur_sz" -gt "$max_sz";
        then
            max_sz=$cur_sz
            winner=$cur_ln
        fi
    done
    echo "$max_sz" : "$winner"
}

find /usr/share/zoneinfo | longest

# ok, if you really wish to use globals, here you go ;)
LONGEST_CNT=0
LONGEST_CNT=$(
    find /usr/share/zoneinfo \
      | longest \
      | cut -d: -f1 \
      | xargs echo\
)
echo "LONGEST_CNT='$LONGEST_CNT'"

Aside from avoiding the subshell annoyance, it gives you perfect place to document the code, and sort-of adds namespacing: notice that inside function you can use much shorter and simpler variable names without losing any of readability.

2 Comments

The last part of this is the best answer really, but fyi you can put the pipes at the ends of the lines and avoid the line continuation backslashes.
@PrivAcyplease the intentional advantage of this style is that it makes the pipeline structure obvious, especially for commands in the middle. That's IMO well worth the extra chars.
2

About finding the longest pathname. Here's an alternative:

find /usr/share/zoneinfo | while read line; do
    echo "${#line} $line" 
done | sort -nr | head -n 1

# Result:
58 /usr/share/zoneinfo/right/America/Argentina/ComodRivadavia

Forgive me if this is considered off topic, I hope it helps someone.

6 Comments

Using echo to print the output from backicks is rarely useful or necessary. See useless use of echo.
@tripleee: AFAICT both echos are useful in this example: wc needs the input from $line, and $line needs to be displayed after the output from wc. Please, feel free to suggest improvements without adding more lines or variables.
@grebneke I upvoted your answer (and comment) but after thinking about it some more, I realised you could avoid using command substitution to run wc. Instead, you could use POSIX shell parameter expansion: echo ${#line} $line;
@AnthonyGeoghegan nice suggestion. To be clear, your example uses # for String Length as opposed to Number of positional parameters - I think you accidentally posted the wrong link. # is used for both and it can get confusing.
@AnthonyGeoghegan updated with your solution, way faster, gj tyvm
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.