2

I'm having issues getting around websites that use http authentication, I have a list of sites which I do some scrapping on but some of these have http authentication on them. I'm not looking to get the content of those sites I want to be able to be able to determine if they are guarded by http auth and then move on. For example in the snippet below agent.get never return so it's impossible for me to handle it. How can I handle a case like this?

require 'mechanize'
agent = Mechanize.new
page = agent.get('http://freyalovesmusic.co.uk')

2 Answers 2

2

You could assume that if a page takes too long to load, it is using http authentication. Obviously not 100% accurate, but perhaps good enough for your situation?

You can use the Timeout module to move on after a certain amount of time, even if agent.get never returns:

require 'mechanize'
require 'timeout'

agent = Mechanize.new
begin
    Timeout::timeout(5) do
        page = agent.get('http://freyalovesmusic.co.uk')
    end
rescue Timeout::Error
    puts 'Page likely using http authentication'
end
Sign up to request clarification or add additional context in comments.

1 Comment

Wow awesome... this is what ended up doing, actually did it before reading it here. Validates my thinking.
1

It should be raising a Mechanize::UnauthorizedError but it's misbehaving for some reason. Maybe you should report it on the mechanize github issues form.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.