4

I'm parsing a large amount of files using nodejs. In my process, I'm parsing audio files, video files and than the rest.

The function to parse files looks like this :

/**
* @param arr : array of files objects (path, ext, previous directory)
* @param cb : the callback when every object is parsed, 
*             objects are then throwed in a database
* @param others : the array beeing populated by matching objects
**/
var parseOthers = function(arr, cb, others) {

    others = others === undefined ? [] : others;

    if(arr.length == 0)
        return cb(others); //should be a nextTick ?

    var e = arr.shift();

    //do some tests on the element and add it
    others.push(e);
    //Then call next tested callImediate and nextTick according
    //to another stackoverflow questions with no success
    return parseOthers(arr, cb, others);
});

Full code here (care it's a mess)

Now with about 3565 files (not so much) the script catch a "RangeError: Maximum call stack size exceeded" exception, with no trace.

What have I tried :

  • I've tried to debug it with node-inspector and node debug script, but it never hangs as if it was running without debugging (does debugging increase the stack ?).
  • I've tried with process.on('uncaughtException') to catch the exception with no success.

I've got no memory leak.

How may I found an exception trace ?

Edit 1

Increasing the --stack_size seams to work pretty well. Isn't there another way of preventing this ?

(about 1300 there)

Edit 2

According to :

$ node --v8-options | grep -B0 -A1 stack_size

The default stack size (in kBytes) is 984.

Edit 3

A few more explanations :

  • I'm never reading this type of files itselves
  • I'm working here on an array of paths, I don't parse folders recursively
  • I'm looking at the path and checking if it's already stored in the database

My guess is that the populated array becomes to big for nodejs, but memory looks fine and that's weird...

2
  • I encounter the same exception on a nodejs application with multiple processes (Node-inspector isn't relevant in that case). The only solution I found to debug it was... to output stuff into console. At last, unit tests saved me, bacause I needed to restrict the amount of code to instrument with console logs. Commented Feb 13, 2014 at 12:37
  • Unit tests are on my todo list (bad I know), and with such an amount of files it's useless to log 3650 lines... I've increased the stack size and it'll work but if I'm getting an array with > 10k entries it'll fail again. I'm trying to cache some informations to reduce the array amount but it's not an easy task. Commented Feb 13, 2014 at 12:55

2 Answers 2

4

Most Stackoverflow situations are not easy or sometimes possible to debug. Even if you debug on the problem, you may not find the trigger.

But I can suggest you a way to share the task load easily (including the queue management):

JXcore (a multithreaded fork on Node.JS) would suit better in your case. Simply create a task pool and set a task method handling 1 file at a time. It will manage your queue 1 by 1 multithreaded.

var myTask = function ( args here )
{
    logic here
}

for(var i=0;i<LIST_OF_THE_FILES;i++)
    jxcore.tasks.addTask( myTask, paramshere, optional callback ...    

OR in case the logic definition is out of the scope of a single method;

var myTask = function ( args here )
{
    require('mytasketc.js').handleTask(args here);
}

for(var i=0;i<LIST_OF_THE_FILES;i++)
    jxcore.tasks.addTask( myTask, paramshere, optional callback ... 

Remarks

Every single thread has its own V8 memory limit.

The context among the threads are separated

Make sure the task method closes the file in the end

Link

You can find more on multithreaded Javascript tasks

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for pointing me in this direction, but I can't change from nodejs to JXCore. The weirdest part is that the memory just behave normally and that the process hangs after the fifth time it runs the script (setInterval involved).
JX runs node apps but it seems you have another restriction there. Anyway Its always tricky when it comes to stack overflow exception.
1

You getting this error because of recursion. Reformat your code to do not use it, especially because this peace of code really don't need it. Here is just APPROXIMATE example, to show you how better to do it:

var parseElems = function(arr, cb) {
    var result = [];
    arr.forEach(function (el) {
         //do some tests on the element (el)
         result.push(el);
    });

    cb(result);
});

7 Comments

It's just a way of doing an asynchronous loop. See on the github master I was on a forEach loop when I noticed this issue.
In an question this is NOT an async loop. Recursion doesn't make it async as is.
ofc not ! but my callback is called after the array loop ends and I can run the next iterator after an asynchronous call, that's all. It doesn't change anything by doing it non-recursive.
To make it async you have to call it through setTimeout(0) or nextTick(), but you just call return parseOthers(arr, cb, others); it is not async. return cb(others); is also not async.
It will change that you will not get exception. That what you asked.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.