0

I am currently developing a node js app with a REST API that exposes data from a mongo db.

The application needs to update some data every 5 minutes by calling an external service (could take more than one minute to get the new data).

I decided to isolate this task into a child_process but I am not sure about what should I need put in this child process :

  • Only the function to be executed. The schedule is managed by the main process.
  • Having a independent process that auto-refresh data every 5 minute and send a message to main process every time the refresh is done.

I don't really know if there is a big cost to start a new child process every 5 minutes or if I should use only one long time running child process or if I am overthinking the problem ^^

EDIT - Inforamtion the update task

the update task can take up than one minute but it consists in many smaller tasks (gathering information from many external providers) than run asynchronously do many I don't even need a child process ?

Thanks !

3 Answers 3

2

Node.js has an event-driven architecture capable of handling asynchronous calls hence it is unlike your typical C++ program where you will go with a multi-threaded/process architecture.

For your use-case I'm thinking maybe you can make use of the setInterval to repeatedly perform an operation which you can define more tiny async calls through using some sort of promises framework like bluebirdJS?

For more information see:

setInterval: https://developer.mozilla.org/en-US/docs/Web/API/WindowTimers/setInterval

setInterval()

Repeatedly calls a function or executes a code snippet, with a fixed time delay between each call. Returns an intervalID.

Sample code:

setInterval(function() {
    console.log("I was executed"); 
}, MILLISECONDS_IN_FIVE_MINUTE);

Promises: http://bluebirdjs.com/docs/features.html

Sample code:

new Promise(function(resolve, reject) {
  updateExternalService(data)
    .then(function(response) {
        return this.parseExtResp(response);  
    })
    .then(function(parsedResp) {
        return this.refreshData(parsedResp);
    })
    .then(function(returnCode) {
        console.log("yay updated external data source and refreshed");
        return resolve();
    })
    .catch(function(error) {
        // Handle error
        console.log("oops something went wrong ->" + error.message);
        return reject();
    });
  }
Sign up to request clarification or add additional context in comments.

7 Comments

I am already using promise and setInterval, I am just wondering if I need to isolate the task into an separate process ? Does it make sense ?
Are you doing this only because you got hit by some performance issue? Hmm multi-process node architecture don't quite make sense to me. Are you trying to utilize all of your CPU cores? Probably check what lscpu is like first, if u realize your node application is only using 1 of the 10 CPU cores intensively and the program couldn't scale anymore then probably try the multi child thing. Otherwise will be better off doing a code profiling?
May be I am overthinking the problem because I dont have any performance issue for the moment ... So if no performance issue and no blocking code, I just need to let the event loop do the job ?
Yes. Trust NodeJS's event driven architecture! haha.
@Thomas but spawning multi-child is probably worth exploring if you ever notice your core are not getting fully utilized for some reason but that would probably be the last few options.
|
1

It does not matter the total clock time that it takes to get data from an external service as long as you are using asynchronous requests. What matters is how much CPU you are using in doing so. If the majority of the time is waiting for the external service to respond or to send the data, then your node.js server is just sitting idle most of the time and you probably do not need a child process.

Because node.js is asynchronous, it can happily have many open requests that are "in flight" that it is waiting for responses to and that takes very little system resources.

Because node.js is single threaded, it is CPU usage that typically drives the need for a child process. If it takes 5 minutes to get a response from an external service, but only 50ms of actual CPU time to process that request and do something with it, then you probably don't need a child process.

If it were me, I would separate out the code for communicating with the external service into a module of its own, but I would not add the complexity of a child process until you actually have some data that such a change is needed.


I don't really know if there is a big cost to start a new child process every 5 minutes or if I should use only one long time running child process or if I am overthinking the problem

There is definitely some cost to starting up a new child process. It's not huge, but if you're going to be doing it every 5 minutes and it doesn't take a huge amount of memory, then it's probably better to just start up the child process once, have it manage the scheduling of communicating with the external service entirely upon it's own and then it can communicate back results to your other node.js process as needed. This makes the 2nd node process much more self-contained and the only point of interaction between the two processes is to communicate an update. This separation of function and responsibility is generally considered a good thing. In a multi-developer project, you could more easily have different developers working on each app.

2 Comments

ok nice explanation, so if I don't have some blocking task, I should not worry about that before problem(= performance issue) happens ?
@Thomas - Yep, that would be my advice. Don't complicate things until you have actual evidence that it is needed.
1

It depends on how cohesion between your app and the auto refresh task.

If the auto refresh task can running standalone, without interaction with your app, then it better to start your task as a new process. Use child_process directly is not a good idea, spawn/monitor/respawn child process is tricky, you can use crontab or pm2 to manage it.

If auto refresh task depends on your app, you can use child_process directly, send message to it for schedule. But first try to break this dependency, this will simplify your app, easy to deployment and maintain separately. Child process is long running or one shot is not a question until you have hundreds of such task running on one machine.

4 Comments

Interesting, so maybe having a queue to make them communicate could be a solution for me ! Thanks
parent and child IPC channel already builtin: child.send
Yeah I saw that but it only work if you have parent/child processes
If refresh is done, child process can notify your app by call it's REST API.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.