2

I have simple PhantomJS script to parse Javascript content of website to html. (Some data is then extracted from the html code using other tool.)

var page = require('webpage').create();
var fs = require('fs');// File System Module
var output = '/tmp/sourcefile'; // path for saving the local file
page.open('targeturl', function() { // open the file
  fs.write(output,page.content,'w'); // Write the page to the local file using page.content
  phantom.exit(); // exit PhantomJs
});

(I got these lines of code from http://kochi-coders.com/2014/05/06/scraping-a-javascript-enabled-web-page-using-beautiful-soup-and-phantomjs/)

This used to work when all targets had direct links. Now they are behind the same url and there is drop down menu:

<select id="observation-station-menu" name="station" onchange="updateObservationProductsBasedOnForm(this);">
  <option value="101533">Alajärvi Möksy</option>
  ...    
  <option value="101541">Äänekoski Kalaniemi</option>
  </select>

This is the menu item I would actually like to load:

<option value="101632">Joensuu Linnunlahti</option>

Because of this menu my script only downloads data related to the default location. How I load contents of other item from the menu and download html content of that item instead?

My target site is this: http://ilmatieteenlaitos.fi/suomen-havainnot

(If there is better way than PhantomJS for doing this I could use it just as well. My interest is in dealing with the data once get it scraped and I chose PhantomJS just because it was the first thing that worked. Some options might be limited because my server is a Raspberry Pi and might not work on it: Python Selenium: Firefox profile error)

2 Answers 2

3

Since the page have jQuery, you can do:

page.open('targeturl', function() { // open the file
  page.evaluate(function() {
    jQuery('#observation-station-menu').val('101632').change();
  });  //change the checkbox, then fires the event
  fs.write(output,page.content,'w'); // Write the page to the local file using page.content
  phantom.exit(); // exit PhantomJs
});
Sign up to request clarification or add additional context in comments.

3 Comments

Your variant is probably better, since it will be easier to update I guess.
I ran this but the resulting file still contained information of the default choice. From the file: <option value="100971" selected="selected">Helsinki Kaisaniemi</option>
@MadocComadrin Well, duh, there is a delay between changing the value and seeing a new image. You need to add a delay to your script.
1

You could directly call the function, which is defined in the underlying js on that page:

var page = require('webpage').create();
var fs = require('fs');// File System Module
var output = '/tmp/sourcefile'; // path for saving the local file
page.open('targeturl', function() { // open the file
  page.evaluate(function() {
     updateObservationProducts(101632, 'weather');
  });
  window.setTimeout(function () {
    fs.write(output,page.content,'w'); // Write the page to the local file using page.content
    phantom.exit(); // exit PhantomJs
  }, 1000); // Change timeout as required to allow sufficient time 

});

For the waiting to render, see this phantomjs not waiting for "full" page load , I copy pasted a part from rhunwicks solution.

1 Comment

This acted similarly to the other answer. The passed without errors but contained data of the default selection.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.