bash script grep using variable fails to find result that actually does exist

Question

I have a bash script that iterates over a list of links, curl's down an html page per link, greps for a particular string format (syntax is: CVE-####-####), removes the surrounding html tags (this is a consistent format, no special case handling necessary), searches a changelog file for the resulting string ID, and finally does stuff based on whether the string ID was found or not.

The found string ID is set as a variable. The issue is that when grepping for the variable there are no results, even though I positively know there should be for some of the ID's. Here is the relevant portion of the script:

for link in $(cat links.txt); do
    curl -s "$link" | grep 'CVE-' | sed 's/<[^>]*>//g' | while read cve; do
        echo "$cve"
        grep "$cve" ./changelog.txt
    done
done

If I hardcode a known ID in the grep command, the script finds the ID and returns things as expected. I've tried many variations of grepping on this variable (e.g. exporting it and doing command expansion, cat'ing the changelog and piping to grep, setting variable directly via command expansion of the curl chain, single and double quotes surrounding variables, half a dozen other things).

Am I missing something nuanced with the outputted variable from the curl | grep | sed chain? When it is echo'd to stdout or >> to a file, things look fine (a single ID with no odd characters or carriage returns etc.).

Any hints or alternate solutions would be much appreciated. Thanks!

FYI:

OSX:$bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)

Edit:

The html file that I was curl'ing was chock full of carriage returns. Running the script with set -x was helpful because it revealed the true string being grepped: $'CVE-2011-2716\r'.

+ read -r link
+ curl -s http://localhost:8080/link1.html
+ sed -n '/CVE-/s/<[^>]*>//gp'
+ read -r cve
+ grep -q -F $'CVE-2011-2716\r' ./kernelChangelog.txt

Also investigating from another angle, opening the curled file in vim showed ^M and doing a printf %s "$cve" | xxd also showed the carriage return hex code 0d appended to the grep'd variable. Relying on 'echo' stdout was a wrong way of diagnosing things. Writing a simple html page with a valid CVE-####-####, but then adding a carriage return (in vim insert mode just type ctrl-v ctrl-m to insert the carriage return) will create a sample file that fails with the original script snippet above.

This is pretty standard string sanitization stuff that I should have figured out. The solution is to remove carriage returns, piping to tr -d '\r' is one method of doing that. I'm not sure there is a specific duplicate on SO for this series of steps, but in any case here is my now working script:

while read -r link; do
  curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read -r cve; do
    if grep -q -F "$cve" ./changelog.txt; then
      echo "FOUND: $cve";
    else
      echo "NOT FOUND: $cve";
    fi;
  done
done < links.txt

Don't trust echo. Especially with an unquoted argument. printf '[%s]\n' "$cve" is better as is printf %s "$cve" | xxd. — Etan Reisner
– Etan Reisner, Commented Jun 1, 2015 at 20:19
I'd break this down when troubleshooting and start by using a curl on single link piped to grep and test on stdout to figure out what the real issue is. — cchamberlain
– cchamberlain, Commented Jun 1, 2015 at 20:28
You may want to also post sample data that can replicate the problem. — Ed King
– Ed King, Commented Jun 1, 2015 at 20:34
General script troubleshooting advice: Put set -x at the beginning of the script, so it shows each command as it's executing, with the variables expanded. — Barmar
– Barmar, Commented Jun 1, 2015 at 20:43
You should almost always quote your variables, in case they contain whitespace or wildcard characters. — Barmar
– Barmar, Commented Jun 1, 2015 at 20:44

Barmar · Accepted Answer · 2015-06-01 23:41:51Z

2

HTML files can contain carriage returns at the ends of lines, you need to filter those out.

curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read cve; do

Notice that there's no need to use grep, you can use a regular expression filter in the sed command. (You can also use the tr command in sed to remove characters, but doing this for \r is cumbersome, so I piped to tr instead).

answered Jun 1, 2015 at 23:41

Barmar

789k57 gold badges555 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hek2mgl · Accepted Answer · 2015-06-01 21:57:08Z

2

It should look like this:

# First: Care about quoting your variables!

# Use read to read the file line by line
while read -r link ; do
    # No grep required. sed can do that.
    curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | while read -r cve; do
        echo "$cve"
        # grep -F searches for fixed strings instead of patterns
        grep -F "$cve" ./changelog.txt
    done
done < links.txt

edited Jun 1, 2015 at 21:57

answered Jun 1, 2015 at 21:53

hek2mgl

159k31 gold badges263 silver badges280 bronze badges

4 Comments

mcanfield Over a year ago

Thanks for cleaning things up, but things still do not work. There has to be something wrong with that $cve variable. I'll dig deeper.

hek2mgl Over a year ago

I would need to see the contents of links.txt and changelog.txt

mcanfield Over a year ago

@Barmar gave me the tip to use set -x in the script. That showed there is a carriage return \r being appended to the $cve variable. I'll give him a chance to post an actual answer that explains why and/or how to resolve. If he doesn't do that, perhaps you can edit this current answer to include that and I'll mark it accepted. In either case, thanks for the cleanup.

hek2mgl Over a year ago

Please concentrate more on this comment: stackoverflow.com/questions/30582516/… Otherwise the question isn't helpful for the community.

Collectives™ on Stack Overflow

bash script grep using variable fails to find result that actually does exist

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related