From the question, it looks like the file contains all sorts of characters and you want to isolate the first digits of all numbers in the file. It also looks like the number need not be the first word on a line (as in without any spaces before it). Keeping these 2 assumptions in mind, you can do the following:
grep '[0-9]' test.html| sed 's/\([0-9]\+\)/\n\1\n/g' |grep '^[0-9]' |cut -c1 |sort |uniq -c
An example:
curl -N -s 'http://stackoverflow.com/users/1353267/samveen' |grep '[0-9]' |sed 's/\([0-9]\+\)/\n\1\n/g' |cut -c1 |grep '^[0-9]' |sort |uniq -c
IMPORTANT: In the above example page, there is a line {"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":1353267,"accountId":1430801,"gravatar":"<div class=\"\">, which will be divided up as follows:
{"fkey":"
8
f
1
a
9
c
6
e
21503516793
b
853265
ec
4939
","isRegistered":true,"userId":
1353267
,"accountId":
1430801
,"gravatar":"<div class=\"\">
If you don't want this behaviour, the sed pattern will change to
sed 's/\b\([0-9]\+\)\b/\n\1\n/g', which means that this now searches for independent numbers (\b is word borders), and the output of the sed command is now:
{"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":
1353267
,"accountId":
1430801
,"gravatar":"<div class=\"\">
Also, if the sed transformation is chosen smartly, the cut command is not needed. That is if
\([0-9]\+\) portion of the pattern is changed to \([0-9]\)[0-9]*, then sed will only display the first digit of each number, and not the whole number. Thus no need for cut -c1 anymore. Using
sed 's/\b\([0-9]\)[0-9]*\b/\n\1\n/g', we get:
{"fkey":"8f1a9c6e21503516793b853265ec4939","isRegistered":true,"userId":
1
,"accountId":
1
,"gravatar":"<div class=\"\">
Thus, no need for cut.
Given further information about the input file, the command can be optimized even further.