Print multiple regex matches using grep on the same line

Question

I'm trying to match all digits including integer and decimal using grep, and print the matches on the same line (for easier use to plot using gnuplot). For instance,

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | grep -E -o '\d+(\.\d+)?'

prints

100
1000
3212.97

but how do I get all that in the same line like the following?

100  1000  3212.97

^{Editor's note: The original form of the question used just \d+ as the regex, as reflected in some older answers.}

Eventually, I would like it to work with multiple input files given like:

grep Throughput *.out | grep -E -o '\d+(\.\d+)?'

should print

100  1000  3212.97
200  3000  5444.77
300  5000  6769.32

experiment.pl · Accepted Answer · 2019-10-08 10:27:40Z

2

All of these solutions seem overtly complicated. The one presented is not particularly efficient but works:

while read -r line
do
echo $line | grep -o "PATTERN"  | tr "\n" " "  ; echo 
done < grep.txt

What it does:

1) Reads each line from file grep.txt separately and greps for the pattern. This allows you to have multiple patterns, you're not constrained by any particular number or very specific regex

2) Then you delete all unnecessary newlines with tr, convert them to spaces (for each specific line with any number of patterns, not for the whole file)

3) At the end the echo command establishes to move to next line

What you end up is patterns from the same line in grep.txton the same line, exactly as required.

answered Oct 8, 2019 at 10:27

experiment.pl

612 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

my chalupa Over a year ago

This is infinitely better to understand.

clt60 · Accepted Answer · 2017-04-05 16:47:35Z

1

Some other variants:

Every example bellow uses this regex:

(\d+\.\d*|\.\d+|\d+)

It matches, (in one group) the ddd. ddd.ddd .ddd ddd. If your decimals are different, for example don't want capture the .ddd (only decimal) variant, just remove it from the regex.

Usage for one file/string

#using `paste`
echo "bench-100-net-buffering1000.out:Throughput: 3212.97"  | grep -Eo '(\d+\.\d*|\.\d+|\d+)' | paste -s -
# using echo for making the "one line"
echo $(grep -Eo '(\d+\.\d*|\.\d+|\d+)' <<< "bench-100-net-buffering1000.out:Throughput: 3212.97")
#HERESTRING and different separator
grep -Eo '(\d+\.\d*|\.\d+|\d+)' <<< "bench-100-net-buffering1000.out:Throughput: 3212.97" | paste -sd, -
#process substitution.. ;)
paste -sd ' ' <(grep -Eo '(\d+\.\d*|\.\d+|\d+)' <<< "bench-100-net-buffering1000.out:Throughput: 3212.97")

Same as above for multiple files, using bash loops. In the examples using ff* for the filenames.

#Using null-term find
while IFS= read -r -d '' file; do
        grep -Eo '(\d+\.\d*|\.\d+|\d+)' "$file" | paste -s -
done < <(find . -maxdepth 1 -type f -name ff\* -print0)

# or alternative - also prints filenames
while IFS= read -r -d '' file; do
        echo "$file:" $(grep -Eo '(\d+\.\d*|\.\d+|\d+)' $file)
done < <(find . -maxdepth 1 -type f -name ff\* -print0)

echo Using FOR loop
for file in ff* ; do
        grep -Eo '(\d+\.\d*|\.\d+|\d+)' "$file" | paste -s -
done

perl variants:

perl -0777 -nE 'say "@{[/(\d+\.\d*|\.\d+|\d+)/g]}"' ff*

also prints filenames

perl -0777 -nE 'say "$ARGV @{[/(\d+\.\d*|\.\d+|\d+)/g]}"' ff*

also by using different field separator \t

perl -0777 -nE '$"="\t";say "$ARGV @{[/(\d+\.\d*|\.\d+|\d+)/g]}"' ff*

All perl solution uses the baby-cart operator. It is usually not recommented for a production code, but acceptable for the oneliners.

demo:

perl -0777 -nE 'say "@{[/(\d+\.\d*|\.\d+|\d+)/g]}"' <<< "some-111-decimal-222.-another-333.33-only-frac-.444.txt"

output

111 222. 333.33 .444

edited Apr 5, 2017 at 16:47

answered Apr 5, 2017 at 14:40

clt60

64.3k17 gold badges114 silver badges206 bronze badges

11 Comments

123 Over a year ago

Regarding your perl answer, I'm pretty sure they only want Throughput lines.

clt60 Over a year ago

@123 he said: 'm trying to match all digits ... so, all :)

123 Over a year ago

Mine has decimal?

123 Over a year ago

Also their example of multiple file is grep Throughput *.out | grep -E '\d+(\.\d+)?', which would suggest they only want Throughput lines.

123 Over a year ago

I don't understand what that means?

|

anubhava · Accepted Answer · 2017-04-05 14:54:36Z

1

Here is a single gnu awk command to get your output:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" |
awk 'n = split($0, a, /[0-9]*\.?[0-9]+/, vals) {
   for (i=1; i<=n; i++)
      printf "%s%s", vals[i], (i == n ? ORS : OFS)
}'

100 1000 3212.97

answered Apr 5, 2017 at 14:54

anubhava

790k67 gold badges603 silver badges671 bronze badges

Comments

nlu · Accepted Answer · 2017-04-05 15:20:50Z

1

I like this solution in Perl - this should get the floating points correctly too:

perl -ne 'print join("\t", /(\d+(?:.\d+))/g); print "\n"' files*

The first argument to join gives the field delimiter

The ?: creates a so-called non-capturing-group to avoid duplicating the part after the floating point in the output - see: https://perldoc.perl.org/perlretut.html#Non-capturing-groupings

edited Apr 5, 2017 at 15:20

answered Apr 5, 2017 at 15:14

nlu

1,96215 silver badges21 bronze badges

Comments

reflective_mind · Accepted Answer · 2017-04-05 18:11:32Z

For your first simple case, you get the desired output with the following:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | 
grep -o -E '[0-9]*\.?[0-9]+' | column

Output:

100  1000  3212.97

EDIT:

Thanks to mklement0, who pointed out that using paste instead of column is probably a better solution:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | 
grep -o -E '[0-9]*\.?[0-9]+' | paste -s -

For multiple input files, I would also prefer a perl solution since it seems to be fairly easy and straightforward:

perl -nE 'say join "\t", /[0-9]*\.?[0-9]+/g' *.out

This example uses (just for the demonstration) three identical input files file1.out, file2.out and file3.out.

Output:

100  1000  3212.97
100  1000  3212.97
100  1000  3212.97

EDIT (in response to mklement0's comment):

To only process all lines containing the word "Throughput", here is a slightly extended example:

perl -nE 'say join "\t", /[0-9]*\.?[0-9]+/g if /Throughput/' *.out

Community · Accepted Answer · 2017-05-23 12:32:17Z

Single-input case:

$ echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | 
    grep -E -o '[0-9]+(\.[0-9]+)?' |
      paste -sd' ' -
100 1000 3212.97

Note that I've changed the regex to be POSIX-compliant by replacing \d with [0-9], given that you don't specify a platform.
- BSD/macOS grep always understands \d, but GNU grep only does so with the -P option, which BSD/macOS doesn't support.
paste -sd ' ' - replaces the newlines with spaces to get a single-line, space-separated list of numbers.
- Operand - represents stdin, and is required in the BSD/macOS version of paste (optional with GNU paste).
- -s concatenates the input lines in sequence.
- d' ' specifies that a space char. should be used as the separator (delimiter) between the input lines when concatenating; paste's default is the tab char. (\t).
- Using paste this way is superior to tr '\n' ' ', because the latter produces a trailing space.
  paste is also preferable to column, because the latter inserts line breaks if the output line grows wider than the display (and also invariably uses \t as the separator (the -s option only works with -t, which cannot be used here)).
  That said, paste cannot use a multi-character string as the fixed separator; the sample output in the question currently uses 2 spaces as the separator string, so if you wanted to achieve that, pipe paste's output to sed 's/ / /g

Multi-file input case:

^{The solution below uses a shell loop and 2 grep calls and a paste call per input file; consider using the more concise and efficient Perl solution from inferno's helpful answer instead.}

^{If you're willing to assume that all matching lines contain exactly 3 numbers, a more efficient solution with grep and paste is available (adapted from a solution attempt by the OP himself); paste is used to apply the 3 separator chars passed to -d (space, space, newline) individually, cyclically:

paste -sd ' \n' <(grep -h Throughput *.out | grep -Eo '[0-9]+(\.[0-9]+)?')}

For file-specific output you must process the files individually (this assumes that all numbers across matching lines in a given file should be output as a single line):

for file in *.out; do
  grep Throughput "$file" | grep -Eo '[0-9]+(\.[0-9]+)?' | paste -sd ' ' -
done

for file in *.out loops over all matching files individually.
grep Throughput "$file" outputs all lines in the file at hand containing Throughput.
| grep -Eo '[0-9]+(\.[0-9]+)?' then extracts the numbers from these lines, with each number printed on its own line.
| paste -sd ' ' - then replaces the newlines with spaces to get a single-line list of numbers per file.

As for why your approach won't work:

grep Throughput *.out | grep -Eo '\d+(\.\d+)?'

sends a single stream of matching lines across all input files through the pipeline, so subsequent commands have no way of knowing which lines came from what file or line, making it impossible to group the numbers per input file or line (in a subsequent step) - unless you can make assumptions about the exact, fixed number of numbers contained in each input line.

Aif · Accepted Answer · 2017-04-05 14:58:18Z

0

Why not sed? Simple ugly solution (feedback welcome):

$ echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | sed -re 's/[^0-9]+/ /g;s/ +/ /g;s/^ //' 
100 1000 3212 97

Or a explicitly match integers and floating points numbers:

$ echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | sed -re 's/([^0-9]+)([0-9]+|[0-9]+\.[0-9]+)/\2 /g'
100 1000 3212.97

edited Apr 5, 2017 at 14:58

answered Apr 5, 2017 at 14:37

Aif

11.3k1 gold badge32 silver badges44 bronze badges

4 Comments

123 Over a year ago

If the sed supports -r you can almost certainly use ; instead of separate -e's

Aif Over a year ago

thanks @123. Why is ; better than several -e arguments?

123 Over a year ago

I guess it's not technically , I just find it to be far more readable.

Ed Morton Over a year ago

Use -E, not -r, for portability to other seds. -r is GNU only while -E works in GNU and OSX.

Johny · Accepted Answer · 2017-04-05 15:30:52Z

0

Based on your question, Here is a simple command which would get the output you are trying to get.

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | grep -oE '[0-9]+(\.[0-9]+)?' | tr '\n' ' ' |  paste -s

100 1000 3212.97

Hope this helps!

edited Apr 5, 2017 at 15:30

answered Apr 5, 2017 at 15:24

Johny

12 bronze badges

1 Comment

mklement0 Over a year ago

If you use tr '\n' ' ' (which is not a good idea, because it adds a trailing space), paste -s has no effect at all. A single paste command should do.

Dudi Boy · Accepted Answer · 2019-10-08 11:10:06Z

0

I really like anubhava awk script.

I wish to improve it with some more gnu awk feature to make it simpler and concise.

This trick will print all numbers in input line, no matter how many.

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" |
awk 'BEGIN {FPAT="[0-9]*\\.?[0-9]+"} {  # define input fields to be numbers
    $1 = $1; # recalculate the input line to hold only input fields
    print;   # print recalculated input line
}'

Or with one liner:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" |
awk 'BEGIN{FPAT="[0-9]*\\.?[0-9]+"}{$1=$1}1'

answered Oct 8, 2019 at 11:10

Dudi Boy

1

Collectives™ on Stack Overflow

Print multiple regex matches using grep on the same line

9 Answers 9

1 Comment

11 Comments

Comments

Comments

Comments

Comments

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

1 Comment

11 Comments

Comments

Comments

Comments

Comments

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related