0

The code below works fine on Ubuntu 20.04. It checks the .csv file which contains URLs in column A. Every single address URL is in a new row.

To use it you need to run the script by typing:

bash script.sh file_with_urls.csv response_code

for example: bash script.sh urls-to-check.csv 200

#!/usr/bin/env bash
while read -r link; do
    response=$(curl --output /dev/null --write-out %{http_code} "$link")
    if [[ "$response" == "$2" ]]; then
        echo "$link"
    fi
done < "$1"

If I use it on Windows 10 with WSL Ubuntu 20.04 distribution I'm getting "curl: (3) URL using bad/illegal format or missing URL" error.

I'm a bit stuck with this...

4
  • You need to figure out a way to find out which URL (from the file) is failing. Either echo them before you invoke the curl command or print them out to a file after a successful call... Once you have the URL/culprit, then you can see what's wrong with it (to see if it's missing something or it's illegal in some way). Without any additional information, there is no easy way for us to help you other than by guessing Commented Nov 14, 2021 at 21:42
  • 2
    read -r link is reading the entire line (not just the first field) into link. See BashFAQ #1: "How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?" The CSV file might also have DOS/Windows line endings, which adds another pile of potential confusion. Adding set -x as the second line of the script (just after the shebang) will print an execution trace that'll help show problems like this. Commented Nov 14, 2021 at 21:45
  • I dont understand everything in the debug mode but the second line at the end of the URL address has \r. Think this is the case... + read link ++ curl --output /dev/null --silent --write-out '%{http_code}' {full_url_here}/\r' + response=000 [[ 000 == \4\0\4 ]] When I do sed like dan shows the script work normally. Appreciate that you pointed the sources so I can understand what exacly happen and why Commented Nov 15, 2021 at 17:49
  • blueface, thank you for the answer. I understand what you said but I don`t know how to achieve this... This is because my skills are to low to execute this... Commented Nov 15, 2021 at 18:33

1 Answer 1

2

It's probably line endings:

#!/usr/bin/env bash

while IFS=, read -ra link; do
    response=$(curl --output /dev/null --write-out %{http_code} "${link[0]}")
    if [[ "$response" == "$2" ]]; then
       echo "${link[0]}"
    fi
done < <(sed 's/\r$//' "$1")

You can also do dos2unix urls_to_check.csv to convert it. If you open it in Windows, it could get converted back.

Alternatively, invoke it like this:

bash script.sh <(sed 's/\r$//' file_with_urls.csv) response_code
Sign up to request clarification or add additional context in comments.

2 Comments

Your answer is a perfect complement to what Gordon Davisson wrote, and both are excellent study material! The script works when I use sed at the end. Why you use "${link[0]}" instead of just "$link"?
@luknij you said it was a csv file (comma separated variable). If there was column-A,column-B,column-C, IFS=, read-ra splits each line in to a bash array, so${link[0]} is column-A, ${link[1]} is column-B, ${link[2]} is column-C, etc. If there's only one column, you can just use read -r link. In fact, instead of sed, you could also use curl ... ${link%$'\r'} to remove the carriage returns. You can also remove CRs permanently with dos2unix file.csv. But, if you open the file in Windows, they will probably come back.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.