1

I like using curl and the command line to process html pages.

Relative urls are a pain.

Is there some easy utility to make all relative urls absolute?

Ideally this would look something like

curlabsolute $URL | process
4
  • URLs relative to what? Commented Oct 17, 2017 at 21:01
  • webreference.com/html/tutorial2/3.html A relative url when fetching a page from a web server is a url without a hostname or a protocol like /hello.html. Using urls like this means that the website can be served from multiple domain names, using different protocols, or easily moved between domains. The downside is that the urls are not a unique identifier of a resource any more, but can only be interpreted together with the base page that you fetched. Commented Oct 17, 2017 at 22:10
  • A relative url when fetching a page from a web server is a url without a hostname or a protocol - Hence my question: relative to what? The base URL is available only to the program that downloads the file. Commented Oct 18, 2017 at 9:23
  • Umm... so it might be useful to be able to provide a base URL and interpret all URLs as relative this base URL. In the example above I talked about something imaginary called curlabsolute which both fetches a page and absolutize's the URLs this has access to a base url. Commented Oct 18, 2017 at 13:56

1 Answer 1

4

What you need is wget utulity:

Let's say we need to download a simple web-page given by http://www.littlewebhut.com/articles/simple_web_page/.

The command (the below used url is real, the command can be tested "as is"):

wget -O simple_page -k http://www.littlewebhut.com/articles/simple_web_page/
  • -O (--output-document=file) - The documents will not be written to the appropriate files, but all will be concatenated together and written to file.

  • -k (--convert-links) - After the download is complete, convert the links in the document to make them suitable for local viewing


I will just demonstrate some context html fragment from the mentioned web-page before downloading (online varsion):

...
<ul>
          <li><a href="/" class="color-menu">Home</a></li>
          <li><a href="/html/" class="color-menu">HTML</a></li>
          <li><a href="/css/" class="color-menu">CSS</a></li>
          <li><a href="/javascript/" class="color-menu">JavaScript/jQuery</a></li>
          <li><a href="/inkscape/" class="color-menu">Inkscape</a></li>
          <li><a href="/gimp/" class="color-menu">GIMP</a></li>
          <li><a href="/blender/" class="color-menu">Blender</a></li>
          <li><a href="/articles/" class="color-menu">Articles</a></li>
          <li><a href="/contact/" class="color-menu">Contact</a></li>
        </ul>

The same fragment after downloading, saved in the file simple_page:

...
<ul>
          <li><a href="http://www.littlewebhut.com/" class="color-menu">Home</a></li>
          <li><a href="http://www.littlewebhut.com/html/" class="color-menu">HTML</a></li>
          <li><a href="http://www.littlewebhut.com/css/" class="color-menu">CSS</a></li>
          <li><a href="http://www.littlewebhut.com/javascript/" class="color-menu">JavaScript/jQuery</a></li>
          <li><a href="http://www.littlewebhut.com/inkscape/" class="color-menu">Inkscape</a></li>
          <li><a href="http://www.littlewebhut.com/gimp/" class="color-menu">GIMP</a></li>
          <li><a href="http://www.littlewebhut.com/blender/" class="color-menu">Blender</a></li>
          <li><a href="http://www.littlewebhut.com/articles/" class="color-menu">Articles</a></li>
          <li><a href="http://www.littlewebhut.com/contact/" class="color-menu">Contact</a></li>
        </ul>

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.