using a proxy with a rails url link

Question

So I have a nokogiri web scrape running perfectly on my local machine.

However when I try and run the web scrape on my production environment it get a 403 error code appear.

I believe this is down to the website blocking my ip of my server (probably because previous people using that ip have blocked it)

Is it possible to route the nokogiri request from my web server through a proxy server? If so how would I go about it?

This is the code I have at the moment.

doc = Nokogiri::HTML(open(URL HERE, 'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.854.0 Safari/535.2'))

Where are you getting the 403 from? From the websites your trying to scrape? — thesecretmaster
– thesecretmaster, Commented Jun 21, 2016 at 9:09
Indeed i am, I'm under the impression that they've blocked the server ip address, Thats why i thought of a proxy — sam.roberts55
– sam.roberts55, Commented Jun 21, 2016 at 9:33
I had a very very quick scan read, Isn't the charles proxy thing a desktop client? Thanks — sam.roberts55
– sam.roberts55, Commented Jun 21, 2016 at 9:47
It's true for Charles, but it's just a sample of proxy, i.e. ("localhost", 8888) in the example, which might be anything for your purpose. Actually, you can simply pass proxy to open method (see answer below), it's just I was using Mechanize all the time as a wrapper on Nokogiri. — Pavel Bulanov
– Pavel Bulanov, Commented Jun 21, 2016 at 9:53

the Tin Man · Accepted Answer · 2016-06-22 17:47:08Z

0

Actually, you can simply use the :proxy parameter of the OpenURI open method.

open(*rest, &block)
#open provides `open' for URI::HTTP and URI::FTP.

...

The hash may include other options, where keys are symbols:
:proxy

Synopsis:    
:proxy => "http://proxy.foo.com:8000/"
:proxy => URI.parse("http://proxy.foo.com:8000/")

If :proxy option is specified, the value should be String, URI, boolean or nil.

Also, as a general consideration (being tedious now), you should search for alternatives around scrapping content, especially if it's done on a regular basis. Things like supported API or alternative sources. If your current server IP got blocked, the same can happen to the proxy.

edited Jun 22, 2016 at 17:47

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Jun 21, 2016 at 9:52

Pavel Bulanov

9536 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Pavel Bulanov Over a year ago

Probably you won't get good and free proxies. Free proxies work randomly, stop working occasionally, and so forth. You can work with them, but not for something that should be reliable. For reliable proxies you should search for paid services, there are many (horde of) and I can't judge on which ones are good or bad.

Pavel Bulanov Over a year ago

Also, as a general consideration (being tedious now), you should search for alternatives around scrapping content, especially if it's done on a regular basis. Things like supported API or alternative sources. If your current server IP got blocked, same can happen to the proxy.

sam.roberts55 Over a year ago

Yeah i would prefer a api but the api the web provider use is either out of date or not updated alongside the website.

Collectives™ on Stack Overflow

using a proxy with a rails url link

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related