2

I have a script that runs a grep command and formats the results nicely for me, asking if I want to open any of the resulting files in an editor etc.

The core of my script is a command like this:

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

It runs the grep, outputting the file name on every line and then runs some processing to put the file names on a different line from the results.

> grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
3-ten eleven twelve

I have more processing that pretties up the results further and gives me a list of the matching files, asking me which I want to open up in an editor.

I am having trouble with files whose name/path include hyphens and numbers (e.g. "/tmp/searchTest/numbers/222-222-222/222-222-222.txt") which means my sed command fails to pick out the file name from the hyphen/colon delimited line numbers.

Here is a script that sets up a test case showing this:

#!/bin/bash

rm -rf /tmp/searchTest 2> /dev/null
mkdir -p /tmp/searchTest/numbers/111-111-111
mkdir -p /tmp/searchTest/numbers/222-222-222
mkdir -p /tmp/searchTest/letters/aaa-aaa-aaa
mkdir -p /tmp/searchTest/letters/bbb-bbb-bbb

cat << EOF > /tmp/searchTest/numbers/111-111-111/111-111-111.txt
one two three
four five six
seven eight nine
EOF

cat << EOF > /tmp/searchTest/numbers/222-222-222/222-222-222.txt
four five six
seven eight nine
ten eleven twelve
EOF

cat << EOF > /tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
one two three
four five six
seven eight nine
EOF

cat << EOF > /tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
four five six
seven eight nine
ten eleven twelve
EOF

echo "Contents of /tmp/searchTest"
tree /tmp/searchTest

echo -e "\nFirst search, looking for \"eight\".\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters

echo -e "\nExtending first search, looking for \"eight\" and extracting file names.\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

echo -e "\nSecond search, looking for \"eight\".\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers

echo -e "\nExtending second search, looking for \"eight\" and extracting file names - but fails.\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

The results for the second search shows how the file names break the sed command.

First search, looking for "eight".
---
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt:3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt:2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt-3-ten eleven twelve

Extending first search, looking for "eight" and extracting file names.
---
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
3-ten eleven twelve

Second search, looking for "eight".
---
/tmp/searchTest/numbers/111-111-111/111-111-111.txt:3:seven eight nine
--
/tmp/searchTest/numbers/222-222-222/222-222-222.txt:2:seven eight nine
/tmp/searchTest/numbers/222-222-222/222-222-222.txt-3-ten eleven twelve

Extending second search, looking for "eight" and extracting file names - but fails.
---
/tmp/searchTest/numbers/111
111-111/111-111-111.txt
3:seven eight nine
--
/tmp/searchTest/numbers/222
222-222/222-222-222.txt
2:seven eight nine
/tmp/searchTest/numbers/222
222-222/222-222-222.txt-3-ten eleven twelve

Is there a better way to pick out the file names? This is a general purpose script, so there is no set pattern I can rely on for file names: spaces, digits, letters, no extension etc are all possible.

It seems like the only way to do this reliably would be to run grep twice, with the first being a grep -l just to get the file names alone, which I can then map to the results.. But this is pretty exteme, especially for a big search space.


Update: Thursday 20 March 2025, 06:00:22 pm

Adding more detail on actual use in response to a comment from @Yokai.

Here is an example of how I use this script already. This works quite well for me, showing me search results and asking what files I want to open in a text editor.

> search.sh -d /Users/rob.bram/DirTechTips -y e -t "junit temporary" -A2
Search for pattern "junit temporary" in dir /Users/rob.bram/DirTechTips through file pattern "*.*"

====
./Java/cheat_Java-Junit.md
17:- [JUnit Temporary Files](#junit-temporary-files)
18-    - [Listing files in temp dir during debugging](#listing-files-in-temp-dir-during-debugging)
19-- [Parallel Test Execution for JUnit 5](#parallel-test-execution-for-junit-5)
--
445:## JUnit Temporary Files
446-
447:This section: [JUnit Temporary Files](cheat_Java-Junit.md#junit-temporary-files) | [Back to top](#top)
448-
449-From:  [Working and unit testing with temporary files in Java](https://blogs.oracle.com/javamagazine/working-and-unit-testing-with-temporary-files-in-java).
--
614:- Added section `JUnit Temporary Files`.
615-
616-Wednesday, 27th of October 2021, 10
46:26 AM
--

====
./Java/cheat_Java-File-System.md
43:1. Temp files in JUnit. See [JUnit Temporary Files](cheat_Java-Junit.md#junit-temporary-files).
44-2. Create temp file or directory with Java via `java.nio.file.Files` (Java 7).
45-


Do you want to view any of the matching files?
============
File  0: ./Java/cheat_Java-Junit.md
File  1: ./Java/cheat_Java-File-System.md
----
Specify files to open. [A]ll, [N]one or [x y z] space separated indexes.
Can also override editor choice. EDITOR can be one of favourite [t]ext editor (VS Code), [e]clipse, [l]ess, n[o]tepad, [v]im, co[n]sole or c[y]gstart.

This ends up running the following core grep command: grep -HE --text -i -B 0 -A 2 -n -H "junit temporary"

5
  • 1
    It might be useful to explain what the exact functionality and formatting you need is as well as explaining what the initial strings are vs what you expect them to be. You may have luck checking into both grep and sed extended regular expression functionality using the option -E rather than -e. Also keep in mind, you can have multiple sed programs in the same sed call by using ; to start the next sed program. So it might look something like: sed -Er 's/[pattern1/[result]/;s/[pattern2/[result]/' You don't have to do it this way but it does keep sed programs more compact. Commented Mar 20 at 6:43
  • Thanks @Yokai - I added detail on actual usage. Commented Mar 20 at 7:05
  • so what is -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./" doing there? Commented Mar 20 at 7:26
  • @KamilCuk - that sed puts the resulting file names on different lines to the actual search results. Compare First search, looking for "eight" to Extending first search, looking for "eight" and extracting file names. Commented Mar 20 at 7:42
  • I don't know if this is helpful, but when you want grep only to tell you the name of the file, containing your text, you might use grep -l. Commented Mar 20 at 14:24

5 Answers 5

2

sed is not necessarily the best possible tool for this but you can try (tested with GNU sed):

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest |
sed -E '/:[0-9]+:/{h;s/:.*/-/;x;s/:([0-9]+:)/\n\1/;p;d;}
H;x;/^(.*)\n\1/{s/^(.*)\n\1(.*)/\2/;p;d;};s/.*\n//'
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
3-ten eleven twelve
--
/tmp/searchTest/numbers/111-111-111/111-111-111.txt
3:seven eight nine
--
/tmp/searchTest/numbers/222-222-222/222-222-222.txt
2:seven eight nine
3-ten eleven twelve

It works by storing the filename in the hold space and comparing the other lines with this to match the filename exactly.

But this is not robust at all. If your filenames can contain newlines sed will split them. We could solve this with GNU extensions of grep and sed (if you have these) but there are other issues. If the filenames can also contain substrings that match /:[0-9]+:/, for instance, there is not way to distinguish that from what grep added. Same if the files themselves contain lines matching eight and /:[0-9]+:/. So a much better and robust solution would be to first find the names of the files, and process them independently, for instance with awk, to print whatever format you like. Example:

find /tmp/searchTest -type f,l -name '*.txt' -print0 |
xargs -0 awk -v s="eight" '
  $0 ~ s {print FILENAME ORS FNR ":" $0; flag=1; next}
  flag   {print (FNR>1 ? FNR "-" $0 ORS : "") "--"; flag=0}
'

We use xargs to pass the filenames to awk without exceeding the maximum command line length (in case there are too many files).

We use the NUL character as separator between the filenames (-print0 action of find, -0 option of xargs) because it is the only character that cannot be found in a filename. This guarantees that, even with exotic file names, the solution works. This is another reason for using xargs.

We use find instead of grep because a POSIX-compliant find must support the -print0 action (write the current pathname to standard output, followed by NUL), while a POSIX-compliant grep may not support the equivalent option (-Z with GNU grep). This is not a problem because awk can perfectly do the job of grep.

Note: in your question you use the terms pattern, search, matching that can basically have two meanings:

  1. The pattern is a regular expression and you search your files for strings that match it. This is what your use of grep, without the -F option, suggests. If it is indeed what you want the awk script above does exactly that (in $0 ~ s, ~ is the regular expression matching operator).
  2. The pattern is a string and you search files containing it. For this to work with grep you should use the -F option. If it is what you want the awk script must be modified. Replace $0 ~ s with index($0, s).
Sign up to request clarification or add additional context in comments.

Comments

1

Use -Z to have the filenames followed by a \0 byte rather than a : or - character, then instruct sed to look for that \0 byte:

# have sed replace the first 0 byte with \n
grep -Z -HinER ... | sed -e 's/\x00/\n/'

This should lift the ambiguity

Comments

1

While sed is great, it will become completely unreadable. Just write the actual program and logic.

results=$(grep -HinER --include "*.txt" "eight" /tmp/searchTest/letters)
filenames=()
while IFS=":" read -r filename number line; do
  fileames+=("$filename")
  echo "$filename"
  # Get lines and two lines after.
  mapfile -s "$((number-1))" -n 2 -t lines <"$filename"
  # print the lines.
  for i in "${!lines[@]}"; do
    echo "$((number+i)):${lines[i]}"
  done
done <<<"$results"

2 Comments

Oh wow, read the colon delimited line! It is far less likely that I will encounter files with a colon in the name. Allowed in *nix, but not Windows and Macos. Giving this a try..
When the result has 2 lines for a file, the second one is like /tmp/searchTest/numbers/222-222-222/222-222-222.txt-3-ten eleven twelve, without a colon.
1

This might work for you (GNU sed):

sed -sn '/eight/{F;=;s/^/:/p;$!{n;=;s/^/-/p}};$s/.*/--/p' /tmp/searchTest/*/*/*|
sed '/^[0-9]/{N;s/\n//}'

No need to invoke grep as sed can emulate the same result (except that a second invocation of sed is needed to remove unwanted newlines).

Or using grep, this might work too:

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters/|
sed -E '$!N;s/^(\/[^:]+):(.*)\n\1-/\1\n\2\n/;t
        s/^(\/[^:]+):/\1\n/;t;P;D'

Comments

0

You can modify your answer by adding the .txt extension in your sed command:

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers/  | 
  sed -re "s/(.*txt):([0-9]+:)/\1\n\2/" -e "s/(.*txt)-([0-9]+-)/\1\n\2/" -e "s/^[.]/\\n./"

You can make .txt a variable with mask='.txt'.
Another idea is creating a list of matching files first:

find /tmp/searchTest -type f -name \*.txt -exec grep -liE -B 0 -A 2 "eight" {} \; |
   xargs -n1 -I'{}' bash -c 'echo {}; grep -inER -B 0 -A 2 "eight" {}'

2 Comments

No, this is a general purpose search script. The file extension can be anything.. or have no file extension. The point being I cannot rely on knowing anything like that about the file names.
Using sed was a work-around in the first place. I added an alternative with find/xargs. When you want to make .txt a variable, you might want to use grep -HinER -B 0 -A 2 --include "*${mask}" "eight" /tmp/searchTest/numbers/ | sed -re "s/(.*\.txt):([0-9]+:)/\1\n\2/" -e "s/(.*\.txt)-([0-9]+-)/\1\n\2/" -e "s/^[.]/\\n./", however I prefer the find command.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.