Picking out file names from grep results where file name contains numbers and hyphens

Question

I have a script that runs a grep command and formats the results nicely for me, asking if I want to open any of the resulting files in an editor etc.

The core of my script is a command like this:

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

It runs the grep, outputting the file name on every line and then runs some processing to put the file names on a different line from the results.

> grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
3-ten eleven twelve

I have more processing that pretties up the results further and gives me a list of the matching files, asking me which I want to open up in an editor.

I am having trouble with files whose name/path include hyphens and numbers (e.g. "/tmp/searchTest/numbers/222-222-222/222-222-222.txt") which means my sed command fails to pick out the file name from the hyphen/colon delimited line numbers.

Here is a script that sets up a test case showing this:

#!/bin/bash

rm -rf /tmp/searchTest 2> /dev/null
mkdir -p /tmp/searchTest/numbers/111-111-111
mkdir -p /tmp/searchTest/numbers/222-222-222
mkdir -p /tmp/searchTest/letters/aaa-aaa-aaa
mkdir -p /tmp/searchTest/letters/bbb-bbb-bbb

cat << EOF > /tmp/searchTest/numbers/111-111-111/111-111-111.txt
one two three
four five six
seven eight nine
EOF

cat << EOF > /tmp/searchTest/numbers/222-222-222/222-222-222.txt
four five six
seven eight nine
ten eleven twelve
EOF

cat << EOF > /tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
one two three
four five six
seven eight nine
EOF

cat << EOF > /tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
four five six
seven eight nine
ten eleven twelve
EOF

echo "Contents of /tmp/searchTest"
tree /tmp/searchTest

echo -e "\nFirst search, looking for \"eight\".\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters

echo -e "\nExtending first search, looking for \"eight\" and extracting file names.\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

echo -e "\nSecond search, looking for \"eight\".\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers

echo -e "\nExtending second search, looking for \"eight\" and extracting file names - but fails.\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

The results for the second search shows how the file names break the sed command.

First search, looking for "eight".
---
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt:3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt:2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt-3-ten eleven twelve

Extending first search, looking for "eight" and extracting file names.
---
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
3-ten eleven twelve

Second search, looking for "eight".
---
/tmp/searchTest/numbers/111-111-111/111-111-111.txt:3:seven eight nine
--
/tmp/searchTest/numbers/222-222-222/222-222-222.txt:2:seven eight nine
/tmp/searchTest/numbers/222-222-222/222-222-222.txt-3-ten eleven twelve

Extending second search, looking for "eight" and extracting file names - but fails.
---
/tmp/searchTest/numbers/111
111-111/111-111-111.txt
3:seven eight nine
--
/tmp/searchTest/numbers/222
222-222/222-222-222.txt
2:seven eight nine
/tmp/searchTest/numbers/222
222-222/222-222-222.txt-3-ten eleven twelve

Is there a better way to pick out the file names? This is a general purpose script, so there is no set pattern I can rely on for file names: spaces, digits, letters, no extension etc are all possible.

It seems like the only way to do this reliably would be to run grep twice, with the first being a grep -l just to get the file names alone, which I can then map to the results.. But this is pretty exteme, especially for a big search space.

Update: Thursday 20 March 2025, 06:00:22 pm

Adding more detail on actual use in response to a comment from @Yokai.

Here is an example of how I use this script already. This works quite well for me, showing me search results and asking what files I want to open in a text editor.

> search.sh -d /Users/rob.bram/DirTechTips -y e -t "junit temporary" -A2
Search for pattern "junit temporary" in dir /Users/rob.bram/DirTechTips through file pattern "*.*"

====
./Java/cheat_Java-Junit.md
17:- [JUnit Temporary Files](#junit-temporary-files)
18-    - [Listing files in temp dir during debugging](#listing-files-in-temp-dir-during-debugging)
19-- [Parallel Test Execution for JUnit 5](#parallel-test-execution-for-junit-5)
--
445:## JUnit Temporary Files
446-
447:This section: [JUnit Temporary Files](cheat_Java-Junit.md#junit-temporary-files) | [Back to top](#top)
448-
449-From:  [Working and unit testing with temporary files in Java](https://blogs.oracle.com/javamagazine/working-and-unit-testing-with-temporary-files-in-java).
--
614:- Added section `JUnit Temporary Files`.
615-
616-Wednesday, 27th of October 2021, 10
46:26 AM
--

====
./Java/cheat_Java-File-System.md
43:1. Temp files in JUnit. See [JUnit Temporary Files](cheat_Java-Junit.md#junit-temporary-files).
44-2. Create temp file or directory with Java via `java.nio.file.Files` (Java 7).
45-


Do you want to view any of the matching files?
============
File  0: ./Java/cheat_Java-Junit.md
File  1: ./Java/cheat_Java-File-System.md
----
Specify files to open. [A]ll, [N]one or [x y z] space separated indexes.
Can also override editor choice. EDITOR can be one of favourite [t]ext editor (VS Code), [e]clipse, [l]ess, n[o]tepad, [v]im, co[n]sole or c[y]gstart.

This ends up running the following core grep command: grep -HE --text -i -B 0 -A 2 -n -H "junit temporary"

It might be useful to explain what the exact functionality and formatting you need is as well as explaining what the initial strings are vs what you expect them to be. You may have luck checking into both grep and sed extended regular expression functionality using the option -E rather than -e. Also keep in mind, you can have multiple sed programs in the same sed call by using ; to start the next sed program. So it might look something like: sed -Er 's/[pattern1/[result]/;s/[pattern2/[result]/' You don't have to do it this way but it does keep sed programs more compact. — Yokai
– Yokai, Commented Mar 20 at 6:43
so what is -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./" doing there? — KamilCuk
– KamilCuk, Commented Mar 20 at 7:26
@KamilCuk - that sed puts the resulting file names on different lines to the actual search results. Compare First search, looking for "eight" to Extending first search, looking for "eight" and extracting file names. — Robert Mark Bram
– Robert Mark Bram, Commented Mar 20 at 7:42
I don't know if this is helpful, but when you want grep only to tell you the name of the file, containing your text, you might use grep -l. — Dominique
– Dominique, Commented Mar 20 at 14:24

Renaud Pacalet · Accepted Answer · 2025-03-23 08:16:51Z

sed is not necessarily the best possible tool for this but you can try (tested with GNU sed):

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest |
sed -E '/:[0-9]+:/{h;s/:.*/-/;x;s/:([0-9]+:)/\n\1/;p;d;}
H;x;/^(.*)\n\1/{s/^(.*)\n\1(.*)/\2/;p;d;};s/.*\n//'
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
3-ten eleven twelve
--
/tmp/searchTest/numbers/111-111-111/111-111-111.txt
3:seven eight nine
--
/tmp/searchTest/numbers/222-222-222/222-222-222.txt
2:seven eight nine
3-ten eleven twelve

It works by storing the filename in the hold space and comparing the other lines with this to match the filename exactly.

But this is not robust at all. If your filenames can contain newlines sed will split them. We could solve this with GNU extensions of grep and sed (if you have these) but there are other issues. If the filenames can also contain substrings that match /:[0-9]+:/, for instance, there is not way to distinguish that from what grep added. Same if the files themselves contain lines matching eight and /:[0-9]+:/. So a much better and robust solution would be to first find the names of the files, and process them independently, for instance with awk, to print whatever format you like. Example:

find /tmp/searchTest -type f,l -name '*.txt' -print0 |
xargs -0 awk -v s="eight" '
  $0 ~ s {print FILENAME ORS FNR ":" $0; flag=1; next}
  flag   {print (FNR>1 ? FNR "-" $0 ORS : "") "--"; flag=0}
'

We use xargs to pass the filenames to awk without exceeding the maximum command line length (in case there are too many files).

We use the NUL character as separator between the filenames (-print0 action of find, -0 option of xargs) because it is the only character that cannot be found in a filename. This guarantees that, even with exotic file names, the solution works. This is another reason for using xargs.

We use find instead of grep because a POSIX-compliant find must support the -print0 action (write the current pathname to standard output, followed by NUL), while a POSIX-compliant grep may not support the equivalent option (-Z with GNU grep). This is not a problem because awk can perfectly do the job of grep.

Note: in your question you use the terms pattern, search, matching that can basically have two meanings:

The pattern is a regular expression and you search your files for strings that match it. This is what your use of grep, without the -F option, suggests. If it is indeed what you want the awk script above does exactly that (in $0 ~ s, ~ is the regular expression matching operator).
The pattern is a string and you search files containing it. For this to work with grep you should use the -F option. If it is what you want the awk script must be modified. Replace $0 ~ s with index($0, s).

LeGEC · Accepted Answer · 2025-04-02 03:58:28Z

1

Use -Z to have the filenames followed by a \0 byte rather than a : or - character, then instruct sed to look for that \0 byte:

# have sed replace the first 0 byte with \n
grep -Z -HinER ... | sed -e 's/\x00/\n/'

This should lift the ambiguity

edited Apr 2 at 3:58

answered Apr 1 at 6:40

LeGEC

53.4k5 gold badges69 silver badges127 bronze badges

Comments

KamilCuk · Accepted Answer · 2025-03-20 07:33:55Z

1

While sed is great, it will become completely unreadable. Just write the actual program and logic.

results=$(grep -HinER --include "*.txt" "eight" /tmp/searchTest/letters)
filenames=()
while IFS=":" read -r filename number line; do
  fileames+=("$filename")
  echo "$filename"
  # Get lines and two lines after.
  mapfile -s "$((number-1))" -n 2 -t lines <"$filename"
  # print the lines.
  for i in "${!lines[@]}"; do
    echo "$((number+i)):${lines[i]}"
  done
done <<<"$results"

answered Mar 20 at 7:33

KamilCuk

146k8 gold badges84 silver badges154 bronze badges

2 Comments

Robert Mark Bram Mar 20 at 8:02

Oh wow, read the colon delimited line! It is far less likely that I will encounter files with a colon in the name. Allowed in *nix, but not Windows and Macos. Giving this a try..

Walter A Mar 20 at 21:50

When the result has 2 lines for a file, the second one is like /tmp/searchTest/numbers/222-222-222/222-222-222.txt-3-ten eleven twelve, without a colon.

potong · Accepted Answer · 2025-03-22 22:57:12Z

1

This might work for you (GNU sed):

sed -sn '/eight/{F;=;s/^/:/p;$!{n;=;s/^/-/p}};$s/.*/--/p' /tmp/searchTest/*/*/*|
sed '/^[0-9]/{N;s/\n//}'

No need to invoke grep as sed can emulate the same result (except that a second invocation of sed is needed to remove unwanted newlines).

Or using grep, this might work too:

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters/|
sed -E '$!N;s/^(\/[^:]+):(.*)\n\1-/\1\n\2\n/;t
        s/^(\/[^:]+):/\1\n/;t;P;D'

edited Mar 22 at 22:57

answered Mar 22 at 21:14

potong

59.3k6 gold badges55 silver badges92 bronze badges

Comments

Walter A · Accepted Answer · 2025-03-21 16:04:34Z

0

You can modify your answer by adding the .txt extension in your sed command:

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers/  | 
  sed -re "s/(.*txt):([0-9]+:)/\1\n\2/" -e "s/(.*txt)-([0-9]+-)/\1\n\2/" -e "s/^[.]/\\n./"

You can make .txt a variable with mask='.txt'.
Another idea is creating a list of matching files first:

find /tmp/searchTest -type f -name \*.txt -exec grep -liE -B 0 -A 2 "eight" {} \; |
   xargs -n1 -I'{}' bash -c 'echo {}; grep -inER -B 0 -A 2 "eight" {}'

edited Mar 21 at 16:04

answered Mar 20 at 22:30

Walter A

20.2k2 gold badges29 silver badges46 bronze badges

2 Comments

Robert Mark Bram Mar 20 at 23:55

No, this is a general purpose search script. The file extension can be anything.. or have no file extension. The point being I cannot rely on knowing anything like that about the file names.

Walter A Mar 21 at 16:08

Using sed was a work-around in the first place. I added an alternative with find/xargs. When you want to make .txt a variable, you might want to use

grep -HinER -B 0 -A 2 --include "*${mask}" "eight" /tmp/searchTest/numbers/  | sed -re "s/(.*\.txt):([0-9]+:)/\1\n\2/" -e "s/(.*\.txt)-([0-9]+-)/\1\n\2/" -e "s/^[.]/\\n./"

, however I prefer the find command.

Collectives™ on Stack Overflow

Picking out file names from grep results where file name contains numbers and hyphens

5 Answers 5

Comments

Comments

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related