Return the first number found greater than the provided input number (13 digits)

Question

This script converts the numbers to be at least 13 characters long (for UNIX_MS strings). For use with timestamps, I'm having issues with it being very slow. I wanted an alternative to grepping for one specific UNIX_MS timestamp and not finding it and then having to grep multiple more times.

For the output I wanted the line number in the file (for slicing) as well as the original line (to confirm/inspect).

I'm very specifically looking for optimizations as I'd like this to be as-close-to-as-fast as grepping a single timestamp.

Usage: ./script.sh file UNIX_MS

#! /bin/bash
# return the first number found that's greater than the provided input number
res=
linenum=
count=0
returnline=             # hold onto the line for return
tocheck=$2
tocheck=$(($tocheck*(10**(( ${#tocheck} - 13) * -1))))
inseconds=$(($tocheck/1000))
date=$(date -r $inseconds)
echo "Looking for first timestamp -ge to $date.."
while read line;
  do
    count=$(($count + 1))
    timearray=$(grep -o -E "^(.*?)([0-9]{10,13})?" <<< $line)
    if [ -z "$timearray" ]; then
    echo "PROBLEM"
        echo "grep -o -E '^(.*?)([0-9]{10,13})?' <<< $line"
        exit 1
    fi
    timestamp=$(sed -Ee "s/[0-9]+\://" <<< $timearray)
# normalize timestamp to be 13 digits
    if [ "${#timestamp}" -lt "13" ]; then
        mult=$((10**(( ${#timestamp} - 13) * -1)))
        timestamp=$(($timestamp * $mult))
    fi
    echo "$timestamp >= $tocheck?"
    linenum=$([ "$timestamp" -ge "$tocheck" ] && echo $count)
    if [ -z "$linenum" ]; then
        :;
    else
        returnline=$line 
    break;
    fi
done < $1
echo "$linenum:$returnline"
echo ""

Even if you don't show us the whole file for benchmarking, at least show a couple of representative sample lines of the log file. — 200_success
– 200_success, Commented Dec 4, 2015 at 18:21
Could you explain what the timestamp format is, such that you would want to zero-pad the numbers on the right side? — 200_success
– 200_success, Commented Dec 4, 2015 at 18:43
That still doesn't explain why you would want to zero-pad numbers on the right rather than on the left. — 200_success
– 200_success, Commented Dec 4, 2015 at 19:41

πάντα ῥεῖ · Accepted Answer · 2015-12-04 18:38:33Z

4

I'm having issues with it being very slow.

What makes your script code slow is that you're reading in the file yourself with the while loop, and apply grep to each single input line, instead of passing grep the file itself and let it just do it's job.

No matter what you want to search for with grep, you should always first pass your input to it with a single call, and inspect the results afterwards.

For the output I wanted the line number in the file (for slicing) as well as the original line (to confirm/inspect).

grep already has this feature intrinsically (at least as it says from this documentation):

-n, --line-number
Prefix each line of output with the line number within its input file.

you simply can do what you want using this option.

Thus you can get rid of your while loop and count variable to determine the line number yourself.

edited Dec 4, 2015 at 18:38

answered Dec 4, 2015 at 17:13

πάντα ῥεῖ

5,1724 gold badges23 silver badges32 bronze badges

\$\begingroup\$ How would you accomplish the numerical timestamp comparison, though? \$\endgroup\$

200_success
– 200_success

2015-12-04 18:12:15 +00:00
Commented Dec 4, 2015 at 18:12
\$\begingroup\$ @200_success From further inspection of the already produced results of a single run of grep? May be I misunderstood the question. \$\endgroup\$

πάντα ῥεῖ
– πάντα ῥεῖ

2015-12-04 18:14:53 +00:00
Commented Dec 4, 2015 at 18:14
\$\begingroup\$ For speed I was attempting to output the first result without having to potentially read-in the entire file. \$\endgroup\$

buddyp450
– buddyp450

2015-12-04 18:18:44 +00:00
Commented Dec 4, 2015 at 18:18
\$\begingroup\$ @octanepenguin grep will usually go through entire files very fast, you can expect that will be faster than using it multiple times and reading line by line yourself in the script. I hope I made my idea clear enough. \$\endgroup\$

πάντα ῥεῖ
– πάντα ῥεῖ

2015-12-04 18:23:11 +00:00
Commented Dec 4, 2015 at 18:23
\$\begingroup\$ @πάνταῥεῖ In my case though the files can be upwards of 20+ GB and a result can be in the middle. I get that I'm reading it line by line but how is that any different than grep? Surely it has to seek the file as well? \$\endgroup\$

buddyp450
– buddyp450

2015-12-04 18:32:31 +00:00
Commented Dec 4, 2015 at 18:32

| Show 1 more comment

vnp · Accepted Answer · 2015-12-04 18:08:19Z

3

bash has built-in regular expression support:

    if [[ $line =~ ^(.*?)([0-9]{10,13})? ]]; then
        timestamp=$BASH_REMATCH[2]
    else
        echo "PROBLEM!"
    fi

which completely removes need for grep and sed.

answered Dec 4, 2015 at 18:08

vnp

58.7k4 gold badges55 silver badges144 bronze badges

Add a comment |

Stack Exchange Network

Return the first number found greater than the provided input number (13 digits)

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Return the first number found greater than the provided input number (13 digits)

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions