1

I am new to Linux Shell script as far as I know using find numbers in a file can be done with grep

egrep -o "[0-9][0-9]*" my_file

but how do I get the first-digit of these strings and make it a statistics.. such as 1234, 123, 1267 so I get 1 in 3 times

I know using

A=$(tr -cd 1 < page.html|wc -c)

can get the number "1" count in a file, but it's not what I want.... I want to count first-digit "1"...... that's why it's so hard for me.....

please help...... thanks a lot.

1
  • 1
    Provide sample data of the file and your expected output. Commented Jun 7, 2013 at 4:14

2 Answers 2

3
A=$(egrep -o '[0-9]+' my_file | egrep -c '^1')

The first egrep finds all the numbers and outputs them. The second egrep uses the -c option to output the count of matches, and the regexp matches lines that begin with 1.

Sign up to request clarification or add additional context in comments.

3 Comments

oh my god..... this really helps.... I use, A=$(egrep -o "[0-9][0-9]*" page.html | egrep -c '^1') but what if I can grep all the numbers including floating numbers?
See stackoverflow.com/questions/2139715/… for matching floating point numbers with regexp
I simply typed regex floating point into the SO search bar to find that.
0

From the question, it looks like the file contains all sorts of characters and you want to isolate the first digits of all numbers in the file. It also looks like the number need not be the first word on a line (as in without any spaces before it). Keeping these 2 assumptions in mind, you can do the following:

grep '[0-9]' test.html| sed 's/\([0-9]\+\)/\n\1\n/g' |grep '^[0-9]' |cut -c1 |sort |uniq -c

An example:

curl -N -s 'http://stackoverflow.com/users/1353267/samveen' |grep '[0-9]' |sed 's/\([0-9]\+\)/\n\1\n/g' |cut -c1 |grep '^[0-9]' |sort |uniq -c

IMPORTANT: In the above example page, there is a line {"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":1353267,"accountId":1430801,"gravatar":"<div class=\"\">, which will be divided up as follows:

{"fkey":"
8
f
1
a
9
c
6
e
21503516793
b
853265
ec
4939
","isRegistered":true,"userId":
1353267
,"accountId":
1430801
,"gravatar":"<div class=\"\">

If you don't want this behaviour, the sed pattern will change to
sed 's/\b\([0-9]\+\)\b/\n\1\n/g', which means that this now searches for independent numbers (\b is word borders), and the output of the sed command is now:

{"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":
1353267
,"accountId":
1430801
,"gravatar":"<div class=\"\">

Also, if the sed transformation is chosen smartly, the cut command is not needed. That is if
\([0-9]\+\) portion of the pattern is changed to \([0-9]\)[0-9]*, then sed will only display the first digit of each number, and not the whole number. Thus no need for cut -c1 anymore. Using
sed 's/\b\([0-9]\)[0-9]*\b/\n\1\n/g', we get:

{"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":
1
,"accountId":
1
,"gravatar":"<div class=\"\">

Thus, no need for cut.

Given further information about the input file, the command can be optimized even further.

1 Comment

wow...even better answer.....but I need out put only the counts, don't know how to do it....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.