0

I'm trying to parse webpages recursive by phantomjs.

for example:

WebPage:
 link1,
 link2,
 link3,
 link4,
 link5
 nextPage

what i'm doing with this page:

var parsePage = function(links) {

    // parse everyone link
    for(var i = 0; i < posts.length; i++ )
        parsePost(links[i]);
};

parsePost - i'm getting some information from page, like getting all emails and phones by regex, which take a lot of time

but phantomjs (js) is asynchronous, and not waiting while it'll parse everyone link, and then goes to nextPage. it works a bit another:

- parsing page1
  - parsing link1
  - parsing link2
   ....
  - parsing link5
- parsing page2
  - parsing link1
   ....
  - parsing link5

  -> and just now are comes results to console from parsed page1 -> link1
  .....
- parsing page3

so it takes my 6gb pc memory at 3 minutes :DDD

how can i solve this problem?

i was trying to do:

 1. mb limit program memory use? ( it'll wait while some processes finished and then it continue to parse another pages ? )
 2. i was trying to do like :

> page.open(link, function(... here is pageparser ( wich parsing everyone link))
and then page.close()

but pageparser takes a lot of time, so when i use page.close -> it stop pageparser process.
1
  • did you solve that? Commented Jan 26, 2017 at 15:48

1 Answer 1

1

I think you should design your javascript for phantomjs as suggested/answered in this other post on stackoverflow suggests. I did it that way and it worked just fine.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.