3

I'm trying to match all digits including integer and decimal using grep, and print the matches on the same line (for easier use to plot using gnuplot). For instance,

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | grep -E -o '\d+(\.\d+)?'

prints

100
1000
3212.97

but how do I get all that in the same line like the following?

100  1000  3212.97

Editor's note: The original form of the question used just \d+ as the regex, as reflected in some older answers.

Eventually, I would like it to work with multiple input files given like:

grep Throughput *.out | grep -E -o '\d+(\.\d+)?'

should print

100  1000  3212.97
200  3000  5444.77
300  5000  6769.32

9 Answers 9

2

All of these solutions seem overtly complicated. The one presented is not particularly efficient but works:

while read -r line
do
echo $line | grep -o "PATTERN"  | tr "\n" " "  ; echo 
done < grep.txt

What it does:

1) Reads each line from file grep.txt separately and greps for the pattern. This allows you to have multiple patterns, you're not constrained by any particular number or very specific regex

2) Then you delete all unnecessary newlines with tr, convert them to spaces (for each specific line with any number of patterns, not for the whole file)

3) At the end the echo command establishes to move to next line

What you end up is patterns from the same line in grep.txton the same line, exactly as required.

Sign up to request clarification or add additional context in comments.

1 Comment

This is infinitely better to understand.
1

Some other variants:

Every example bellow uses this regex:

(\d+\.\d*|\.\d+|\d+)

It matches, (in one group) the ddd. ddd.ddd .ddd ddd. If your decimals are different, for example don't want capture the .ddd (only decimal) variant, just remove it from the regex.

Usage for one file/string

#using `paste`
echo "bench-100-net-buffering1000.out:Throughput: 3212.97"  | grep -Eo '(\d+\.\d*|\.\d+|\d+)' | paste -s -
# using echo for making the "one line"
echo $(grep -Eo '(\d+\.\d*|\.\d+|\d+)' <<< "bench-100-net-buffering1000.out:Throughput: 3212.97")
#HERESTRING and different separator
grep -Eo '(\d+\.\d*|\.\d+|\d+)' <<< "bench-100-net-buffering1000.out:Throughput: 3212.97" | paste -sd, -
#process substitution.. ;)
paste -sd ' ' <(grep -Eo '(\d+\.\d*|\.\d+|\d+)' <<< "bench-100-net-buffering1000.out:Throughput: 3212.97")

Same as above for multiple files, using bash loops. In the examples using ff* for the filenames.

#Using null-term find
while IFS= read -r -d '' file; do
        grep -Eo '(\d+\.\d*|\.\d+|\d+)' "$file" | paste -s -
done < <(find . -maxdepth 1 -type f -name ff\* -print0)

# or alternative - also prints filenames
while IFS= read -r -d '' file; do
        echo "$file:" $(grep -Eo '(\d+\.\d*|\.\d+|\d+)' $file)
done < <(find . -maxdepth 1 -type f -name ff\* -print0)

echo Using FOR loop
for file in ff* ; do
        grep -Eo '(\d+\.\d*|\.\d+|\d+)' "$file" | paste -s -
done

perl variants:

perl -0777 -nE 'say "@{[/(\d+\.\d*|\.\d+|\d+)/g]}"' ff*

also prints filenames

perl -0777 -nE 'say "$ARGV @{[/(\d+\.\d*|\.\d+|\d+)/g]}"' ff*

also by using different field separator \t

perl -0777 -nE '$"="\t";say "$ARGV @{[/(\d+\.\d*|\.\d+|\d+)/g]}"' ff*

All perl solution uses the baby-cart operator. It is usually not recommented for a production code, but acceptable for the oneliners.

demo:

perl -0777 -nE 'say "@{[/(\d+\.\d*|\.\d+|\d+)/g]}"' <<< "some-111-decimal-222.-another-333.33-only-frac-.444.txt"

output

111 222. 333.33 .444

11 Comments

Regarding your perl answer, I'm pretty sure they only want Throughput lines.
@123 he said: 'm trying to match all digits ... so, all :)
Mine has decimal?
Also their example of multiple file is grep Throughput *.out | grep -E '\d+(\.\d+)?', which would suggest they only want Throughput lines.
I don't understand what that means?
|
1

Here is a single gnu awk command to get your output:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" |
awk 'n = split($0, a, /[0-9]*\.?[0-9]+/, vals) {
   for (i=1; i<=n; i++)
      printf "%s%s", vals[i], (i == n ? ORS : OFS)
}'

100 1000 3212.97

Comments

1

I like this solution in Perl - this should get the floating points correctly too:

perl -ne 'print join("\t", /(\d+(?:.\d+))/g); print "\n"' files*

The first argument to join gives the field delimiter

The ?: creates a so-called non-capturing-group to avoid duplicating the part after the floating point in the output - see: https://perldoc.perl.org/perlretut.html#Non-capturing-groupings

Comments

1

For your first simple case, you get the desired output with the following:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | 
grep -o -E '[0-9]*\.?[0-9]+' | column

Output:

100  1000  3212.97

EDIT:

Thanks to mklement0, who pointed out that using paste instead of column is probably a better solution:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | 
grep -o -E '[0-9]*\.?[0-9]+' | paste -s -

For multiple input files, I would also prefer a perl solution since it seems to be fairly easy and straightforward:

perl -nE 'say join "\t", /[0-9]*\.?[0-9]+/g' *.out

This example uses (just for the demonstration) three identical input files file1.out, file2.out and file3.out.

Output:

100  1000  3212.97
100  1000  3212.97
100  1000  3212.97

EDIT (in response to mklement0's comment):

To only process all lines containing the word "Throughput", here is a slightly extended example:

perl -nE 'say join "\t", /[0-9]*\.?[0-9]+/g if /Throughput/' *.out

Comments

1

Single-input case:

$ echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | 
    grep -E -o '[0-9]+(\.[0-9]+)?' |
      paste -sd' ' -
100 1000 3212.97
  • Note that I've changed the regex to be POSIX-compliant by replacing \d with [0-9], given that you don't specify a platform.

    • BSD/macOS grep always understands \d, but GNU grep only does so with the -P option, which BSD/macOS doesn't support.
  • paste -sd ' ' - replaces the newlines with spaces to get a single-line, space-separated list of numbers.

    • Operand - represents stdin, and is required in the BSD/macOS version of paste (optional with GNU paste).
    • -s concatenates the input lines in sequence.
    • d' ' specifies that a space char. should be used as the separator (delimiter) between the input lines when concatenating; paste's default is the tab char. (\t).
    • Using paste this way is superior to tr '\n' ' ', because the latter produces a trailing space.
      paste is also preferable to column, because the latter inserts line breaks if the output line grows wider than the display (and also invariably uses \t as the separator (the -s option only works with -t, which cannot be used here)).
      That said, paste cannot use a multi-character string as the fixed separator; the sample output in the question currently uses 2 spaces as the separator string, so if you wanted to achieve that, pipe paste's output to sed 's/ / /g

Multi-file input case:

The solution below uses a shell loop and 2 grep calls and a paste call per input file; consider using the more concise and efficient Perl solution from inferno's helpful answer instead.

If you're willing to assume that all matching lines contain exactly 3 numbers, a more efficient solution with grep and paste is available (adapted from a solution attempt by the OP himself); paste is used to apply the 3 separator chars passed to -d (space, space, newline) individually, cyclically:
paste -sd ' \n' <(grep -h Throughput *.out | grep -Eo '[0-9]+(\.[0-9]+)?')

For file-specific output you must process the files individually (this assumes that all numbers across matching lines in a given file should be output as a single line):

for file in *.out; do
  grep Throughput "$file" | grep -Eo '[0-9]+(\.[0-9]+)?' | paste -sd ' ' -
done
  • for file in *.out loops over all matching files individually.

  • grep Throughput "$file" outputs all lines in the file at hand containing Throughput.

  • | grep -Eo '[0-9]+(\.[0-9]+)?' then extracts the numbers from these lines, with each number printed on its own line.

  • | paste -sd ' ' - then replaces the newlines with spaces to get a single-line list of numbers per file.


As for why your approach won't work:

grep Throughput *.out | grep -Eo '\d+(\.\d+)?'

sends a single stream of matching lines across all input files through the pipeline, so subsequent commands have no way of knowing which lines came from what file or line, making it impossible to group the numbers per input file or line (in a subsequent step) - unless you can make assumptions about the exact, fixed number of numbers contained in each input line.

Comments

0

Why not sed? Simple ugly solution (feedback welcome):

$ echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | sed -re 's/[^0-9]+/ /g;s/ +/ /g;s/^ //' 
100 1000 3212 97

Or a explicitly match integers and floating points numbers:

$ echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | sed -re 's/([^0-9]+)([0-9]+|[0-9]+\.[0-9]+)/\2 /g'
100 1000 3212.97 

4 Comments

If the sed supports -r you can almost certainly use ; instead of separate -e's
thanks @123. Why is ; better than several -e arguments?
I guess it's not technically , I just find it to be far more readable.
Use -E, not -r, for portability to other seds. -r is GNU only while -E works in GNU and OSX.
0

Based on your question, Here is a simple command which would get the output you are trying to get.

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" | grep -oE '[0-9]+(\.[0-9]+)?' | tr '\n' ' ' |  paste -s

100 1000 3212.97

Hope this helps!

1 Comment

If you use tr '\n' ' ' (which is not a good idea, because it adds a trailing space), paste -s has no effect at all. A single paste command should do.
0

I really like anubhava awk script.

I wish to improve it with some more gnu awk feature to make it simpler and concise.

This trick will print all numbers in input line, no matter how many.

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" |
awk 'BEGIN {FPAT="[0-9]*\\.?[0-9]+"} {  # define input fields to be numbers
    $1 = $1; # recalculate the input line to hold only input fields
    print;   # print recalculated input line
}'

Or with one liner:

echo "bench-100-net-buffering1000.out:Throughput: 3212.97" |
awk 'BEGIN{FPAT="[0-9]*\\.?[0-9]+"}{$1=$1}1'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.