1

I am trying to remove all the relative image path slashes from a chunk of HTML that contains several other elements.

For example

<img src="../../../../images/upload/1/test.jpg />

would need to become

<img src="http://s3.amazonaws.com/website/images/upload/1/test.jpg" />

I was thinking of writing this as a rails helper, and just passing the entire block into the method, and make using Nokogiri or Hpricot to parse the HTML instead, but I don't really know.

Any help would be great

Cheers Adam

1
  • 2
    Is there a particular reason why you specify regular expressions? They aren't very well suited to the problem; I think you're putting the cart before the horse. You might get better responses if you edit your title to remove the reference. Commented Mar 31, 2010 at 16:05

3 Answers 3

4

No need to reinvent the wheel, when the builtin 'uri' lib can do that for you:

require 'uri'
main_path = "http://s3.amazonaws.com/website/a/b/c"
relative_path = "../../../../images/upload/1/test.jpg"

URI.join(main_path, relative_path).to_s
  # ==> "http://s3.amazonaws.com/images/upload/1/test.jpg"
Sign up to request clarification or add additional context in comments.

2 Comments

Handy that. I thought you'd have to use URI.parse(...).path and some File.expand_path to do this.
URI.join() is how I do it all the time. As an alternate to URI, Addressable::URI is a nice module because it is a bit more full-featured, especially if you have to work with IDNA-type URLs. en.wikipedia.org/wiki/Internationalized_domain_name
3

One way to construct an absolute path given the absolute URL of the page and a relative path found on that page:

pageurl = 'http://s3.amazonaws.com/website/foo/bar/baz/quux/index.html'
relative = '../../../../images/upload/1/test.jpg'
absolute = pageurl.sub(/\/[^\/]*$/, '')
relative.split('/').each do |d|
  if d == '..'
    absolute.sub!(/\/[^\/]*$/, '')
  else
    absolute << "/#{d}"
  end
end
p absolute

Alternatively, you could cheat a bit:

'http:/'+File.expand_path(File.dirname(pageurl.sub(/^http:/, ''))+'/'+relative)

Comments

1

This chunk might help:

html = '<img src="../../../../images/upload/1/test.jpg />'
absolute_uri = "http://s3.amazonaws.com/website/images"
html.gsub(/(\.\.\/)+images/, absolute_uri)

1 Comment

Of course, this only works if all of the images are under the same path and we know this path beforehand.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.