set variable from file content: little bench
In complement to jhnc's correct answer, here is a litte bench:
First creating a single long line with only ascii characters:
For testing, I want to read only one line. I use pseudo fs in shared memory: /dev/shm in order to minimize fs footprint in bench.
LANG=C man -Len -Pcol\ -b man |
tr \\n \ |
sed 's/[[:space:]]\+/ /g' >/dev/shm/file
This produce on my host a file containing one single 28kb line.
wc /dev/shm/file
0 4749 28670 /dev/shm/file
Creating functions regarding our interest
getvar1() { var=$(cat /dev/shm/file) ;}
getvar2() { var=$(< /dev/shm/file) ;}
getvar3() { read -r var < /dev/shm/file ;}
getvar4() { mapfile -t var < /dev/shm/file ;}
Tests loops:
times=();for string in long short; do
times+=($string)
echo "Doing 4 tests with $string string:"
for test in {1..4}; do
started=${EPOCHREALTIME/.} var=---
for ((i=1000;i--;)); do
getvar$test
done
elap=00000$(( ${EPOCHREALTIME/.}-started))
printf -v "times[${#times[@]}]" %.5f ${elap::-6}.${elap: -6}
mapfile -t command < <(declare -f getvar$test)
printf 'Test: %d: %s\n var is %d len: "%s"\n time: %ssec.\n' $test \
"${command[2]}" "${#var}" "${var::4}...${var: -4}" "${times[-1]}"
done
echo Lorem Ipsum >/dev/shm/file
done
Produce on my host:
Doing 4 tests with long string:
Test: 1: var=$(cat /dev/shm/file)
var is 28670 len: "MAN(...(1) "
time: 1.72581sec.
Test: 2: var=$(< /dev/shm/file)
var is 28670 len: "MAN(...(1) "
time: 0.15679sec.
Test: 3: read -r var < /dev/shm/file
var is 28669 len: "MAN(...N(1)"
time: 0.35542sec.
Test: 4: mapfile -t var < /dev/shm/file
var is 28670 len: "MAN(...(1) "
time: 0.10624sec.
Doing 4 tests with short string:
Test: 1: var=$(cat /dev/shm/file)
var is 11 len: "Lore...psum"
time: 1.55918sec.
Test: 2: var=$(< /dev/shm/file)
var is 11 len: "Lore...psum"
time: 0.01891sec.
Test: 3: read -r var < /dev/shm/file
var is 11 len: "Lore...psum"
time: 0.01821sec.
Test: 4: mapfile -t var < /dev/shm/file
var is 11 len: "Lore...psum"
time: 0.01632sec.
Then
printf '%8s: %10s%10s%10s%10s\n' string test{1..4} ${times[@]}
string: test1 test2 test3 test4
long: 1.72581 0.15679 0.35542 0.10624
short: 1.55918 0.01891 0.01821 0.01632
- using
$(cat file) took more than 1,5 seoonds! We see here the higher cost of setting up a subshell environment!
- using
read is significantly slower than var=$(<file) or mapfile.
- the
read command will drop trailing space(s).
- the quickest seem to be
mapfile.
Note about mapfile
Reading binary under bash is possible, but for this, you have to read by byte (see Yes , bash can read and write binary. This make the, job but slowly.
Using mapfile with null separator could be very efficient. For sample, Linux kernel
use pseudo fs /proc where you could read environment from all process (depending on
your access rights). But all entries are separated by a null byte 0x00.
Reading your own environment is useless, this is a sample only:
mapfile -d '' -t env </proc/$$/environ
Then now in array $env, you must be able to found all your shell environment:
shopt -s extglob
printf 'Variable USER: real="%s", in $env var="%s"\n' "$USER" ${env[@]/#!(USER=?*)}
Should ouptupt something like:
Variable USER: real="john", in $env var="john"
Further with mapfile...
Little binary test using mapfile:
man man | md5sum
2953aa6314f6c27c4277d1731464f2ea -
IFS= LANG=C mapfile -t -d '' binary < <(man man|zstd)
printf '%s\0' "${binary[@]}" | zstd -d | md5sum
zstd: /*stdin*\: unknown header
2953aa6314f6c27c4277d1731464f2ea -
Where zstd complain about trailing null byte added after last ${binary[-1]} field, but decompress binary correctly!
printf '%s\0' "${binary[@]}"| head -c -1 | zstd -d | md5sum
2953aa6314f6c27c4277d1731464f2ea -
VAR=$(cat file),VAR=$(< file), andIFS='' read -r -d '' VAR < fileor something else? When you say "why don't I see the former used extensively in scripts" - is the "former"$(cat file)(that's the first command in the question)? Not my downvote btw.readis more efficient when it's reading from a seekable source such as a regular file, and slower when it's reading from a pipe/FIFO/&c where it can't read more than it needs and rewind the file pointer after).var=$(cat file; printf 'x'); var=${var%x}or similar to add a char after the file thereby ensuring no trailing white space, then remove that char.catis an external command so even though${c ...}may not require a subshell when all you're running is builtins,${c cat file}definitely calls afork(), so there's a transient subshell. (And while theoretically,$(cat file)would be two subshells -- one for the$()and the other in the transientfork()-- making the other a savings, in practice, the shell detects when you're running a process substitution that invokes only one command with no traps &c and performs an implicitexec, so when the conditions for the optimization exist it evens out).