1

I'm totally a regular expression newbie and I think the problem of my code lies in the regular expression I use in match function of awk.

#!/bin/bash
...
line=$(sed -n '167p' models.html)
echo "line: $line"
cc=$(awk -v regex="[0-9]" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH+1); print pattern_match}')
echo "cc: $cc"

The result is:

line:  <td><center>0.97</center></td>
cc: 

In fact, I want to extract the numerical value 0.97 into variable cc.

2 Answers 2

2
  • You need to pass your shell variable $line to awk, otherwise it cannot be used within the script.
  • Alternatively, you can just read the file using awk (no need to involve sed at all).
  • If you want to match the . as well as the digits, you'll have to add that to your regular expression.

Try something like this:

cc=$(awk 'NR == 167 && match($0, /[0-9.]+/) { print substr($0, RSTART, RLENGTH) }' models.html)
Sign up to request clarification or add additional context in comments.

Comments

1

Three things:

You need to pass the value of line into awk with -v:

awk -v line="$line" ...

Your regular expression only matches a single digit. To match a float, you want something like

[0-9]+\.[0-9]+

No need to add 1 to the match length for the substring

substr(line, RSTART, RLENGTH)

Putting it all together:

line='<td><center>0.97</center></td>'
echo "line: $line"
cc=$(awk -v line="$line" -v regex="[0-9]+\.[0-9]+" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH); print pattern_match}')
echo "cc: $cc"

Result:

line: <td><center>0.97</center></td>
cc: 0.97

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.