How to match a string on a file using pure bash?

Question

So i want to match a string/or word on a file, but without using any external tools (grep, sed etc), with only pure bash...

Essentially i want the equivalent of:

grep "string" file

or

grep -w "string" file

in pure bash.

PS: I only care about matching an exact string (with or without newline) on a file, so full regex support isn't needed (which other external tools may support).

What do you want as output? The line, the word, or if there's a match or not? — schrodingerscatcuriosity
– schrodingerscatcuriosity, Commented Mar 23, 2021 at 18:45
May I ask why? This means you would need to write an actual program in bash (rarely a good idea) that will open the file and run a regex match on each line. This will be incredibly slow and just worse than grep in all ways. Are you sure you really want to do this? If this is part of a larger issue, I suspect it may be an xy problem,. — terdon
– terdon ♦, Commented Mar 23, 2021 at 18:49
Many things can be done but should not be done. You're free to do it, and I showed you one way, but the more important message here is that the shell is not a general programming language and should not be used as a programming language for arbitrary problems. If you want to play around with this sort of thing (and you should!) please use an actual programming language and don't try to force the shell into things it was never designed for. — terdon
– terdon ♦, Commented Mar 23, 2021 at 19:08
TBH, looking for a line matching a regex is probably the silliest thing to do manually in the shell, since there's no less than three standard tools that can already do it rather trivially: grep, awk and sed. (With caveats on RE variants and the exact functionality): — ilkkachu
– ilkkachu, Commented Mar 23, 2021 at 19:32
Yeah, again i don't mind when people argue at my somewhat, questionable posts; I'm actually thankful there always some who do their best to convey what they think, whether it's an advice or thought. It's better than silence TBH @ilkkachu that aside, yeah, compression in pure bash would be way too slow... — secemp9
– secemp9, Commented Mar 23, 2021 at 19:45

ilkkachu · Accepted Answer · 2021-03-23 20:12:23Z

You can do it. But it is a really, really bad idea. It will be far slower (as in orders of magnitude slower) than grep and less portable since it depends on features of a specific shell (Bash).

This would print out lines matching a regex pattern given as the first argument, similarly to grep pattern:

#!/bin/bash -

regexp="$1"
ret=1
while IFS= read -r line || [ -n "$line" ]; do
  if [[ $line =~ $regexp ]]; then
    printf '%s\n' "$line"
    ret=0
  fi
done
exit "$ret"

Save that as foo.bash and run like this:

foo.bash pattern < inputFile

Or using standard sh syntax, looking for a fixed string and not a regex:

#!/bin/sh -

string="$1"
ret=1
while IFS= read -r line || [ -n "$line" ]; do
  case $line in
    (*"$string"*)
      printf '%s\n' "$string"
      ret=0
  esac
done
exit "$ret"

(Replace the printf with exit 0 to get behaviour similar to grep -q.)

Just to give you an idea of how slow it is, I created a file with just 10001 lines, the first 5000 being foo, then a single bar and then another 5000 foo :

perl -e 'print "foo\n" x 5000; print "bar\n"; print "foo\n" x 5000;' > file

Now, compare the times for grep and the script above:

$ time grep bar < file
bar

real    0m0.002s
user    0m0.002s
sys     0m0.000s

$ time ./foo.bash bar < file
bar

real    0m0.116s
user    0m0.101s
sys     0m0.016s

As you can see, even with this tiny file, the difference is noticeable. If we try with a more substantial one, the time the script takes turns almost unbearable:

$ perl -e 'print "foo\n" x 500000; print "bar\n"; print "foo\n" x 500000;' > file


$ time grep bar < file
bar

real    0m0.004s
user    0m0.000s
sys     0m0.004s


$ time ./foo.bash bar < file
bar

real    0m11.306s
user    0m10.117s
sys     0m1.188s

However, this is partly because Bash is slow. The standard sh version runs a bit faster with Dash:

$ time dash foo2.sh bar < file
bar

real    0m3.467s
user    0m2.113s
sys     0m1.353s

However, it's still a difference of three orders of magnitude. Multiple seconds for the scripts, against the near-instant grep. And this is still a file with only a million lines and ~4MB in size. I hope you see the problem...

not that new, Bash 3.2 already has regex matching in [[ ]]. Also it works on Ksh and Zsh too, but I didn't check if there's differences between the shells here. — ilkkachu
– ilkkachu, Commented Mar 23, 2021 at 19:09
@ilkkachu yes, that's why I wrote "newer", not "new". Point is it won't even be portable across bash, let alone POSIX shells. — terdon
– terdon ♦, Commented Mar 23, 2021 at 19:10
@ilkkachu, regexps were added in 3.1 but changed in 3.2. zsh behaviour is closer to bas 3.1's. ksh93's is pretty broken. yash as well in [[...]], though is getting better. — Stéphane Chazelas
– Stéphane Chazelas, Commented Mar 23, 2021 at 19:11
If the need is just to check whether a match exists, [[ $(<file) =~ pattern ]] is also going to work - but I'm not posting this as a separate answer to avoid repeating all the caveats and warnings. — fra-san
– fra-san, Commented Mar 23, 2021 at 19:27
@fra-san awww. But that's much cooler than mine! Mine's completely boring. — terdon
– terdon ♦, Commented Mar 23, 2021 at 19:46

Stack Exchange Network

How to match a string on a file using pure bash?

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

How to match a string on a file using pure bash?

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions