Skip to main content
typo fixes; help & appreciation are assumed
Source Link
Jeff Schaller
  • 68.8k
  • 35
  • 122
  • 267

Curl url txt file, but grep each url seperatelyseparately from single file

Ok, here's one for you guys that is puzzling me. So I have a text file with lots of url's in it. I'm using curl -K "$urls" > $output

curl -K "$urls" > $output

to spit the output to my output file. Now for the output of each seperateseparate url there is a term, letslet's say "mortgage", beneath which I do not want any more of the info. Now I know that I can use

sed '/mortgage/q'

to remove all info below the term "mortgage" but if I use it within my script like so

curl -K "$urls" | sed '/mortgage/q' > $output

it's removeit removes everything from the entire output below the first instance of "mortgage" from the output of first url in the $urls, but this wipes all of the info from the other url's (including the stuff before their own instance of the word "mortgage") due to the fact that it is working on the entire output, and not for each url.

How can I specify the sed '/mortgage/q' to act separately on the output of each url in the url file so that it does not effectaffect the output globally. Any help appreciated.?

myMy url file is pretty simple, in the format (this is just an example):

URL = http://www.bbc.co.uk/sport/rugby-union/34914911

URL = http://stackoverflow.com/questions/9084453/simple-script-to-check-if-a-webpage-has-been-updated

and so on.....

*******NEW EDIT: I've conceived a hypothetical way of achieving this but not sure of the code - is there any way I can adapt the curl -K "$urls" | sed '/mortgage/q' > $output command so that the command loops back after each subsequent url in the $url file i.e. So that the curl command initially just retrieves the first url in the file, performs the sed command on that url material, appends to $output, then loops back to the second url in the file, perform the sed command, append to $output and so on.... This would mean the required material from each url was included in the output file, but the stuff below 'mortgage' in each url was not. I just don't know how to achieve this with code. Any ideas?

Curl url txt file, but grep each url seperately from single file

Ok, here's one for you guys that is puzzling me. So I have a text file with lots of url's in it. I'm using curl -K "$urls" > $output

to spit the output to my output file. Now for the output of each seperate url there is a term, lets say "mortgage", beneath which I do not want any more of the info. Now I know that I can use

sed '/mortgage/q'

to remove all info below the term "mortgage" but if I use it within my script like so

curl -K "$urls" | sed '/mortgage/q' > $output

it's remove everything from the entire output below the first instance of "mortgage" from the output of first url in the $urls, but this wipes all of the info from the other url's (including the stuff before their own instance of the word "mortgage") due to the fact that it is working on the entire output, and not for each url.

How can I specify the sed '/mortgage/q' to act separately on the output of each url in the url file so that it does not effect the output globally. Any help appreciated.

my url file is pretty simple, in the format (this is just an example):

URL = http://www.bbc.co.uk/sport/rugby-union/34914911

URL = http://stackoverflow.com/questions/9084453/simple-script-to-check-if-a-webpage-has-been-updated

and so on.....

*******NEW EDIT: I've conceived a hypothetical way of achieving this but not sure of the code - is there any way I can adapt the curl -K "$urls" | sed '/mortgage/q' > $output command so that the command loops back after each subsequent url in the $url file i.e. So that the curl command initially just retrieves the first url in the file, performs the sed command on that url material, appends to $output, then loops back to the second url in the file, perform the sed command, append to $output and so on.... This would mean the required material from each url was included in the output file, but the stuff below 'mortgage' in each url was not. I just don't know how to achieve this with code. Any ideas?

Curl url txt file, but grep each url separately from single file

I have a text file with lots of url's in it. I'm using

curl -K "$urls" > $output

to spit the output to my output file. Now for the output of each separate url there is a term, let's say "mortgage", beneath which I do not want any more of the info. Now I know that I can use

sed '/mortgage/q'

to remove all info below the term "mortgage" but if I use it within my script like so

curl -K "$urls" | sed '/mortgage/q' > $output

it removes everything from the entire output below the first instance of "mortgage" from the output of first url in the $urls, but this wipes all of the info from the other url's (including the stuff before their own instance of the word "mortgage") due to the fact that it is working on the entire output, and not for each url.

How can I specify the sed '/mortgage/q' to act separately on the output of each url in the url file so that it does not affect the output globally?

My url file is pretty simple, in the format (this is just an example):

URL = http://www.bbc.co.uk/sport/rugby-union/34914911

URL = http://stackoverflow.com/questions/9084453/simple-script-to-check-if-a-webpage-has-been-updated

and so on.....

I've conceived a hypothetical way of achieving this but not sure of the code - is there any way I can adapt the curl -K "$urls" | sed '/mortgage/q' > $output command so that the command loops back after each subsequent url in the $url file i.e. So that the curl command initially just retrieves the first url in the file, performs the sed command on that url material, appends to $output, then loops back to the second url in the file, perform the sed command, append to $output and so on.... This would mean the required material from each url was included in the output file, but the stuff below 'mortgage' in each url was not. I just don't know how to achieve this with code. Any ideas?

edited tags
Link
Gilles 'SO- stop being evil'
  • 866.1k
  • 205
  • 1.8k
  • 2.3k
added 740 characters in body
Source Link
neilH
  • 433
  • 1
  • 10
  • 19

Ok, here's one for you guys that is puzzling me. So I have a text file with lots of url's in it. I'm using curl -K "$urls" > $output

to spit the output to my output file. Now for the output of each seperate url there is a term, lets say "mortgage", beneath which I do not want any more of the info. Now I know that I can use

sed '/mortgage/q'

to remove all info below the term "mortgage" but if I use it within my script like so

curl -K "$urls" | sed '/mortgage/q' > $output

it's remove everything from the entire output below the first instance of "mortgage" from the output of first url in the $urls, but this wipes all of the info from the other url's (including the stuff before their own instance of the word "mortgage") due to the fact that it is working on the entire output, and not for each url.

How can I specify the sed '/mortgage/q' to act separately on the output of each url in the url file so that it does not effect the output globally. Any help appreciated.

EDIT:

my url file is pretty simple, in the format (this is just an example):

URL = http://www.bbc.co.uk/sport/rugby-union/34914911

URL = http://stackoverflow.com/questions/9084453/simple-script-to-check-if-a-webpage-has-been-updated

and so on.....

*******NEW EDIT: I've conceived a hypothetical way of achieving this but not sure of the code - is there any way I can adapt the curl -K "$urls" | sed '/mortgage/q' > $output command so that the command loops back after each subsequent url in the $url file i.e. So that the curl command initially just retrieves the first url in the file, performs the sed command on that url material, appends to $output, then loops back to the second url in the file, perform the sed command, append to $output and so on.... This would mean the required material from each url was included in the output file, but the stuff below 'mortgage' in each url was not. I just don't know how to achieve this with code. Any ideas?

Ok, here's one for you guys that is puzzling me. So I have a text file with lots of url's in it. I'm using curl -K "$urls" > $output

to spit the output to my output file. Now for the output of each seperate url there is a term, lets say "mortgage", beneath which I do not want any more of the info. Now I know that I can use

sed '/mortgage/q'

to remove all info below the term "mortgage" but if I use it within my script like so

curl -K "$urls" | sed '/mortgage/q' > $output

it's remove everything from the entire output below the first instance of "mortgage" from the output of first url in the $urls, but this wipes all of the info from the other url's (including the stuff before their own instance of the word "mortgage") due to the fact that it is working on the entire output, and not for each url.

How can I specify the sed '/mortgage/q' to act separately on the output of each url in the url file so that it does not effect the output globally. Any help appreciated.

EDIT:

my url file is pretty simple, in the format:

URL = http://www.bbc.co.uk/sport/rugby-union/34914911

URL = http://stackoverflow.com/questions/9084453/simple-script-to-check-if-a-webpage-has-been-updated

and so on.....

Ok, here's one for you guys that is puzzling me. So I have a text file with lots of url's in it. I'm using curl -K "$urls" > $output

to spit the output to my output file. Now for the output of each seperate url there is a term, lets say "mortgage", beneath which I do not want any more of the info. Now I know that I can use

sed '/mortgage/q'

to remove all info below the term "mortgage" but if I use it within my script like so

curl -K "$urls" | sed '/mortgage/q' > $output

it's remove everything from the entire output below the first instance of "mortgage" from the output of first url in the $urls, but this wipes all of the info from the other url's (including the stuff before their own instance of the word "mortgage") due to the fact that it is working on the entire output, and not for each url.

How can I specify the sed '/mortgage/q' to act separately on the output of each url in the url file so that it does not effect the output globally. Any help appreciated.

my url file is pretty simple, in the format (this is just an example):

URL = http://www.bbc.co.uk/sport/rugby-union/34914911

URL = http://stackoverflow.com/questions/9084453/simple-script-to-check-if-a-webpage-has-been-updated

and so on.....

*******NEW EDIT: I've conceived a hypothetical way of achieving this but not sure of the code - is there any way I can adapt the curl -K "$urls" | sed '/mortgage/q' > $output command so that the command loops back after each subsequent url in the $url file i.e. So that the curl command initially just retrieves the first url in the file, performs the sed command on that url material, appends to $output, then loops back to the second url in the file, perform the sed command, append to $output and so on.... This would mean the required material from each url was included in the output file, but the stuff below 'mortgage' in each url was not. I just don't know how to achieve this with code. Any ideas?

added 247 characters in body
Source Link
neilH
  • 433
  • 1
  • 10
  • 19
Loading
Source Link
neilH
  • 433
  • 1
  • 10
  • 19
Loading