Make all urls in a page absolute from the command line

Question

I like using curl and the command line to process html pages.

Relative urls are a pain.

Is there some easy utility to make all relative urls absolute?

Ideally this would look something like

curlabsolute $URL | process

webreference.com/html/tutorial2/3.html A relative url when fetching a page from a web server is a url without a hostname or a protocol like /hello.html. Using urls like this means that the website can be served from multiple domain names, using different protocols, or easily moved between domains. The downside is that the urls are not a unique identifier of a resource any more, but can only be interpreted together with the base page that you fetched. — Att Righ
– Att Righ, Commented Oct 17, 2017 at 22:10
A relative url when fetching a page from a web server is a url without a hostname or a protocol - Hence my question: relative to what? The base URL is available only to the program that downloads the file. — Satō Katsura
– Satō Katsura, Commented Oct 18, 2017 at 9:23
Umm... so it might be useful to be able to provide a base URL and interpret all URLs as relative this base URL. In the example above I talked about something imaginary called curlabsolute which both fetches a page and absolutize's the URLs this has access to a base url. — Att Righ
– Att Righ, Commented Oct 18, 2017 at 13:56

RomanPerekhrest · Accepted Answer · 2017-10-17 21:41:08Z

What you need is wget utulity:

Let's say we need to download a simple web-page given by http://www.littlewebhut.com/articles/simple_web_page/.

The command (the below used url is real, the command can be tested "as is"):

wget -O simple_page -k http://www.littlewebhut.com/articles/simple_web_page/

-O (--output-document=file) - The documents will not be written to the appropriate files, but all will be concatenated together and written to file.
-k (--convert-links) - After the download is complete, convert the links in the document to make them suitable for local viewing

I will just demonstrate some context html fragment from the mentioned web-page before downloading (online varsion):

...
<ul>
          <li><a href="/" class="color-menu">Home</a></li>
          <li><a href="/html/" class="color-menu">HTML</a></li>
          <li><a href="/css/" class="color-menu">CSS</a></li>
          <li><a href="/javascript/" class="color-menu">JavaScript/jQuery</a></li>
          <li><a href="/inkscape/" class="color-menu">Inkscape</a></li>
          <li><a href="/gimp/" class="color-menu">GIMP</a></li>
          <li><a href="/blender/" class="color-menu">Blender</a></li>
          <li><a href="/articles/" class="color-menu">Articles</a></li>
          <li><a href="/contact/" class="color-menu">Contact</a></li>
        </ul>

The same fragment after downloading, saved in the file simple_page:

...
<ul>
          <li><a href="http://www.littlewebhut.com/" class="color-menu">Home</a></li>
          <li><a href="http://www.littlewebhut.com/html/" class="color-menu">HTML</a></li>
          <li><a href="http://www.littlewebhut.com/css/" class="color-menu">CSS</a></li>
          <li><a href="http://www.littlewebhut.com/javascript/" class="color-menu">JavaScript/jQuery</a></li>
          <li><a href="http://www.littlewebhut.com/inkscape/" class="color-menu">Inkscape</a></li>
          <li><a href="http://www.littlewebhut.com/gimp/" class="color-menu">GIMP</a></li>
          <li><a href="http://www.littlewebhut.com/blender/" class="color-menu">Blender</a></li>
          <li><a href="http://www.littlewebhut.com/articles/" class="color-menu">Articles</a></li>
          <li><a href="http://www.littlewebhut.com/contact/" class="color-menu">Contact</a></li>
        </ul>

Stack Exchange Network

Make all urls in a page absolute from the command line

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Make all urls in a page absolute from the command line

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions