5

I need to parse an array out of a website. The part of the JavaScript I want to parse looks like this:

_arPic[0] = "http://example.org/image1.jpg";
_arPic[1] = "http://example.org/image2.jpg";
_arPic[2] = "http://example.org/image3.jpg";
_arPic[3] = "http://example.org/image4.jpg";
_arPic[4] = "http://example.org/image5.jpg";
_arPic[5] = "http://example.org/image6.jpg";

I get the whole JavaScript using something like this:

product_page = Nokogiri::HTML(open(full_url))    
product_page.css("div#main_column script")[0]

Is there an easy way to parse all the variables?

2 Answers 2

2

If I read you correctly you're trying to parse the JavaScript and get a Ruby array with your image URLs yes?

Nokogiri only parses HTML/XML so you're going to need a different library; A cursory search turns up the RKelly library which has a parse function that takes a JavaScript string and returns a parse tree.

Once you have a parse tree you're going to need to traverse it and find the nodes of interest by name (e.g. _arPic) then get the string content on the other side of the assignment.

Alternatively, if it doesn't have to be too robust (and it wouldn't be) you can just use a regex to search the JavaScript if possible:

/^\s*_arPic\[\d\] = "(.+)";$/

might be a good starter regex.

Sign up to request clarification or add additional context in comments.

Comments

0

The easy way:

_arPic = URI.extract product_page.css("div#main_column script")[0].text

which can be shortened to:

_arPic = URI.extract product_page.at("div#main_column script").text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.