2

I recently learned that there is a special case of command substitution:

The command substitution $(cat file) can be replaced by the equivalent but faster $(< file).

I never use the cat variation, and I have seen extensive usage of read instead for the same, i.e.:

IFS='' read -r -d '' VAR < file

What is the difference in terms of the side effects (e.g. special characters in file), performance or any other aspects between the two and why don't I see the former used extensively in scripts that are otherwise using Bash-only features already?

21
  • 2
    Are you asking for someone to provide a side-by-side comparison of VAR=$(cat file), VAR=$(< file), and IFS='' read -r -d '' VAR < file or something else? When you say "why don't I see the former used extensively in scripts" - is the "former" $(cat file) (that's the first command in the question)? Not my downvote btw. Commented Aug 18 at 12:40
  • 1
    (Performance cost is going to depend on the details -- in particular, read is more efficient when it's reading from a seekable source such as a regular file, and slower when it's reading from a pipe/FIFO/&c where it can't read more than it needs and rewind the file pointer after). Commented Aug 18 at 15:37
  • 2
    @AlbertCamu var=$(cat file; printf 'x'); var=${var%x} or similar to add a char after the file thereby ensuring no trailing white space, then remove that char. Commented Aug 18 at 17:02
  • 1
    @Barmar, ...I just smoketested it with Apple's 3.2.57(1)-release build; it's definitely there. Commented Aug 19 at 19:52
  • 1
    @kojiro, fair, but again, cat is an external command so even though ${c ...} may not require a subshell when all you're running is builtins, ${c cat file} definitely calls a fork(), so there's a transient subshell. (And while theoretically, $(cat file) would be two subshells -- one for the $() and the other in the transient fork() -- making the other a savings, in practice, the shell detects when you're running a process substitution that invokes only one command with no traps &c and performs an implicit exec, so when the conditions for the optimization exist it evens out). Commented Aug 19 at 19:54

3 Answers 3

4

They are not equivalent. The value returned from a subshell has trailing newlines stripped:

See: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_06_03

$(commands)
or (backquoted version):

`commands`
The shell shall expand the command substitution by executing commands in a subshell environment (see 2.13 Shell Execution Environment) and replacing the command substitution (the text of the commands string plus the enclosing "$()" or backquotes) with the standard output of the command(s); if the output ends with one or more bytes that have the encoded value of a <newline> character, they shall not be included in the replacement. Any such bytes that occur elsewhere shall be included in the replacement; however, they might be treated as field delimiters and eliminated during field splitting, depending on the value of IFS and quoting that is in effect. If the output contains any null bytes, the behavior is unspecified.

The cost of setting up a subshell environment is also higher than using read.

In recent versions of bash, mapfile (aka readarray) can also be used as a faster (~10%?) alternative to read (technically it is also slightly different as it creates an array):

Consider:

$ unset v1 v2 v3
$ printf '\n\nabc\t\t\n\n\n' > f
$ v1=$(<f)
$ IFS= read -r -d '' v2 <f
$ mapfile -d '' v3 <f
$ declare -p v1 v2 v3
declare -- v1=$'\n\nabc\t\t'
declare -- v2=$'\n\nabc\t\t\n\n\n'
declare -a v3=([0]=$'\n\nabc\t\t\n\n\n')
$

There's also a difference if the file actually contains nuls:

$ unset v1 v2 v3
$ printf '\n\n\ta\t\n\n\n\tb\t\n\0\n\n\tc\t\n\n\0\n' >f
$ v1=$(<f)
bash: warning: command substitution: ignored null byte in input
$ IFS= read -r -d '' v2 <f
$ mapfile -d '' v3 <f
$ declare -p v1 v2 v3
declare -- v1=$'\n\n\ta\t\n\n\n\tb\t\n\n\n\tc\t'
declare -- v2=$'\n\n\ta\t\n\n\n\tb\t\n'
declare -a v3=([0]=$'\n\n\ta\t\n\n\n\tb\t\n' [1]=$'\n\n\tc\t\n\n' [2]=$'\n')
$
Sign up to request clarification or add additional context in comments.

1 Comment

Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Meta Stack Overflow, or in Stack Overflow Chat. Comments continuing discussion may be removed.
3

set variable from file content: little bench

In complement to jhnc's correct answer, here is a litte bench:

First creating a single long line with only ascii characters:

For testing, I want to read only one line. I use pseudo fs in shared memory: /dev/shm in order to minimize fs footprint in bench.

LANG=C man -Len -Pcol\ -b man |
    tr \\n \ |
    sed 's/[[:space:]]\+/ /g' >/dev/shm/file

This produce on my host a file containing one single 28kb line.

wc /dev/shm/file
    0  4749 28670 /dev/shm/file

Creating functions regarding our interest

getvar1() {  var=$(cat /dev/shm/file)       ;}
getvar2() {  var=$(< /dev/shm/file)         ;}
getvar3() {  read -r var < /dev/shm/file    ;}
getvar4() {  mapfile -t var < /dev/shm/file ;}

Tests loops:

times=();for string in long short; do
    times+=($string)
    echo "Doing 4 tests with $string string:"
    for test in {1..4}; do
        started=${EPOCHREALTIME/.}  var=---
        for ((i=1000;i--;)); do
            getvar$test
        done
        elap=00000$(( ${EPOCHREALTIME/.}-started))
        printf -v "times[${#times[@]}]" %.5f  ${elap::-6}.${elap: -6}
        mapfile -t command < <(declare -f getvar$test)
        printf 'Test: %d: %s\n  var is %d len: "%s"\n  time: %ssec.\n' $test \
            "${command[2]}" "${#var}" "${var::4}...${var: -4}" "${times[-1]}"
        done
        echo Lorem Ipsum >/dev/shm/file
 done

Produce on my host:

Doing 4 tests with long string:
Test: 1:     var=$(cat /dev/shm/file)
  var is 28670 len: "MAN(...(1) "
  time: 1.72581sec.
Test: 2:     var=$(< /dev/shm/file)
  var is 28670 len: "MAN(...(1) "
  time: 0.15679sec.
Test: 3:     read -r var < /dev/shm/file
  var is 28669 len: "MAN(...N(1)"
  time: 0.35542sec.
Test: 4:     mapfile -t var < /dev/shm/file
  var is 28670 len: "MAN(...(1) "
  time: 0.10624sec.
Doing 4 tests with short string:
Test: 1:     var=$(cat /dev/shm/file)
  var is 11 len: "Lore...psum"
  time: 1.55918sec.
Test: 2:     var=$(< /dev/shm/file)
  var is 11 len: "Lore...psum"
  time: 0.01891sec.
Test: 3:     read -r var < /dev/shm/file
  var is 11 len: "Lore...psum"
  time: 0.01821sec.
Test: 4:     mapfile -t var < /dev/shm/file
  var is 11 len: "Lore...psum"
  time: 0.01632sec.

Then

printf '%8s: %10s%10s%10s%10s\n' string test{1..4} ${times[@]}
  string:      test1     test2     test3     test4
    long:    1.72581   0.15679   0.35542   0.10624
   short:    1.55918   0.01891   0.01821   0.01632
  • using $(cat file) took more than 1,5 seoonds! We see here the higher cost of setting up a subshell environment!
  • using read is significantly slower than var=$(<file) or mapfile.
  • the read command will drop trailing space(s).
  • the quickest seem to be mapfile.

Note about mapfile

Reading binary under bash is possible, but for this, you have to read by byte (see Yes , bash can read and write binary. This make the, job but slowly.

Using mapfile with null separator could be very efficient. For sample, Linux kernel use pseudo fs /proc where you could read environment from all process (depending on your access rights). But all entries are separated by a null byte 0x00.

Reading your own environment is useless, this is a sample only:

mapfile -d '' -t env </proc/$$/environ 

Then now in array $env, you must be able to found all your shell environment:

shopt -s extglob
printf 'Variable USER: real="%s", in $env var="%s"\n' "$USER" ${env[@]/#!(USER=?*)}

Should ouptupt something like:

Variable USER: real="john", in $env var="john"

Further with mapfile...

Little binary test using mapfile:

man man | md5sum
2953aa6314f6c27c4277d1731464f2ea  -
IFS= LANG=C mapfile -t -d '' binary < <(man man|zstd)
printf '%s\0' "${binary[@]}" | zstd -d | md5sum
zstd: /*stdin*\: unknown header 
2953aa6314f6c27c4277d1731464f2ea  -

Where zstd complain about trailing null byte added after last ${binary[-1]} field, but decompress binary correctly!

printf '%s\0' "${binary[@]}"| head -c -1 | zstd -d | md5sum
2953aa6314f6c27c4277d1731464f2ea  -

Comments

2

I [...] have seen extensively the use of read instead for the same, i.e.:

IFS='' read -r -d '' VAR < file

Really? I see plenty of use of read, but I don't think I've ever seen it used in the particular manner you describe, to read an entire file into a variable. Of course, reading an entire file into a variable is itself something I don't see very often (and I would generally consider doing so an anti-pattern), so I guess my sample size is limited.

What is the difference in terms of the side effects (e.g. special characters in file),

cat and < both transfer raw bytes without interpretation, until the input is exhausted. On the other hand, read reads and processes just one line. Setting the line delimiter to the empty string does not prevent that. It instructs read that the line delimiter is the NUL character, not that there isn't any delimiter at all. That might be desirable, undesirable, or irrelevant, depending on the situation.

Also, as another answer already observes, command substitutions remove trailing newlines. Because it does not treat newlines as line delimiters, your read command may store trailing newlines in the value of VAR, but VAR=$(< file) will not do so.

performance or any other aspects between the two

read by default interprets line continuations as the shell itself does, and it performs word splitting on the resulting line. The -r option will moot line continuations, and setting IFS to an empty string will neuter word splitting, but that does not necessarily mean that read's provision for these things does not still have a cost.

and why don't I see the former used extensively in scripts that are otherwise using Bash-only features already?

It's not entirely clear what you mean by "the former", but since you say you do see the read variation, I guess you're asking why you don't see $(< file).

That has a lot to do with what scripts you read, so we're not in a position to answer definitively. As I already observed, I don't, myself, see the read usage you describe, which you can take as a data point supporting

  • it's a matter of style and personal preference

You could also consider that

  • Although a bit wasteful, $(cat file) is intuitive. And $(< file) not too much worse in that sense. Your read command, however, requires a lot more analysis to decipher. At least for me. Code clarity is important.

And of course,

  • there are semantic differences between these alternatives, as already described. It is conceivable that the read uses of this form that you observe yourself intentionally make use of that.

Furthermore, although shell script performance is a relevant consideration, it is rarely a primary consideration. If it's important for your program to run as fast as possible then writing it as a shell script at all is a mistake.

7 Comments

Bah. Anyone who knows how to read NUL-delimited streams already knows when they see IFS= read -r -d '' what it means -- it's an idiom at this point, because the several pieces need to be used together to get the standard / commonly-desired effect.
I have difficulty following your "read reads and processes just one line. Setting the line delimiter to the empty string does not prevent that." Say the term line means a unit of number of characters read. Then setting the line deliminer to effectively have it swallow everything (do not care for NULLs now) prevents reading actual "line by line" behaviour.
@CharlesDuffy, I take you to be responding to my remarks about code clarity. You may be quite right that IFS= read -r -d '' will be taken as idiomatic among some group of people who are in the know about it, but I'm not prepared to throw out clarity to others, who seem to include the OP. Especially where anyone might consider the read usage in question as an alternative to $(< file).
@AlbertCamu, read's -d option determines the definition of "line" for its purposes. It never reads more than one line in this sense. This is a semantic difference from $(< file) that cannot be removed, though whether it makes an actual difference for reading any particular file depends on the contents of that file.
I actually thought that such use of read was the usual way to slurp a file. Now considering that cat with command substitution strips the newlines, it looks like it's the only way.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.