how to find/fetch number in a file using shell script

Question

I am new to Linux Shell script as far as I know using find numbers in a file can be done with grep

egrep -o "[0-9][0-9]*" my_file

but how do I get the first-digit of these strings and make it a statistics.. such as 1234, 123, 1267 so I get 1 in 3 times

I know using

A=$(tr -cd 1 < page.html|wc -c)

can get the number "1" count in a file, but it's not what I want.... I want to count first-digit "1"...... that's why it's so hard for me.....

please help...... thanks a lot.

Provide sample data of the file and your expected output.

anubhava
– anubhava

2013-06-07 04:14:33 +00:00
Commented Jun 7, 2013 at 4:14 — anubhava
– anubhava, Commented Jun 7, 2013 at 4:14

Barmar · Accepted Answer · 2013-06-07 04:24:21Z

3

A=$(egrep -o '[0-9]+' my_file | egrep -c '^1')

The first egrep finds all the numbers and outputs them. The second egrep uses the -c option to output the count of matches, and the regexp matches lines that begin with 1.

answered Jun 7, 2013 at 4:24

Barmar

789k57 gold badges555 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ashton Over a year ago

oh my god..... this really helps.... I use, A=$(egrep -o "[0-9][0-9]*" page.html | egrep -c '^1') but what if I can grep all the numbers including floating numbers?

Barmar Over a year ago

See stackoverflow.com/questions/2139715/… for matching floating point numbers with regexp

Barmar Over a year ago

I simply typed regex floating point into the SO search bar to find that.

Samveen · Accepted Answer · 2013-06-07 06:07:34Z

From the question, it looks like the file contains all sorts of characters and you want to isolate the first digits of all numbers in the file. It also looks like the number need not be the first word on a line (as in without any spaces before it). Keeping these 2 assumptions in mind, you can do the following:

grep '[0-9]' test.html| sed 's/\([0-9]\+\)/\n\1\n/g' |grep '^[0-9]' |cut -c1 |sort |uniq -c

An example:

curl -N -s 'http://stackoverflow.com/users/1353267/samveen' |grep '[0-9]' |sed 's/\([0-9]\+\)/\n\1\n/g' |cut -c1 |grep '^[0-9]' |sort |uniq -c

IMPORTANT: In the above example page, there is a line {"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":1353267,"accountId":1430801,"gravatar":"<div class=\"\">, which will be divided up as follows:

{"fkey":"
8
f
1
a
9
c
6
e
21503516793
b
853265
ec
4939
","isRegistered":true,"userId":
1353267
,"accountId":
1430801
,"gravatar":"<div class=\"\">

If you don't want this behaviour, the sed pattern will change to
sed 's/\b$[0-9]\+$\b/\n\1\n/g', which means that this now searches for independent numbers (\b is word borders), and the output of the sed command is now:

{"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":
1353267
,"accountId":
1430801
,"gravatar":"<div class=\"\">

Also, if the sed transformation is chosen smartly, the cut command is not needed. That is if
$[0-9]\+$ portion of the pattern is changed to $[0-9]$[0-9]*, then sed will only display the first digit of each number, and not the whole number. Thus no need for cut -c1 anymore. Using
sed 's/\b$[0-9]$[0-9]*\b/\n\1\n/g', we get:

{"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":
1
,"accountId":
1
,"gravatar":"<div class=\"\">

Thus, no need for cut.

Given further information about the input file, the command can be optimized even further.

wow...even better answer.....but I need out put only the counts, don't know how to do it....

Collectives™ on Stack Overflow

how to find/fetch number in a file using shell script

2 Answers 2

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related