How to handle large data in node js

Question

I have a mysql database which contains data from api ,suppose it has a field called gameid which is unique ,every time a new data comes from API ,I run a query and select all gameid ,then with Array.Filter() method in JS(Node js) filter out the data which is not present in database and is there in API and store the unique data in database

Something Like

let filtered_data=datafromapi.filter(data=>!mysqldata.includes(data.gameid))

But With almost 30k records this takes a lot of time .Any idea how to do such process with mysql and node js.

What is the original format of the data returned by the API? — Sébastien Renauld
– Sébastien Renauld, Commented Aug 31, 2019 at 14:22
json format obviously.That's why I am using filter,and the data from mysql is also in json — user10091301
– user10091301, Commented Aug 31, 2019 at 14:23
I guess you want to upsert in a batch, but I never really used Mysql so I'm not sure if it's the way to go, thats how I'd do that in Mongodb — Jonas Wilms
– Jonas Wilms, Commented Aug 31, 2019 at 14:28
@ambianBeing that would be the preferrable way. It saves a ton of time due to less network data, along with more optimized paths at every step. The code is not fully visible so it's impossible to tell, but I'm ready to bet you can do the entire thing in a single SQL query (via INSERT WHERE NOT EXISTS). — Sébastien Renauld
– Sébastien Renauld, Commented Aug 31, 2019 at 14:41

Shubham Dixit · Accepted Answer · 2019-08-31 19:09:43Z

0

As per the comment of Jonas ,the upsert is the most suitable process which can allow you to avoid inserting duplicate data or probably the error encountered for duplicate records. As an example-

You can have a query like ,

insert into table_name values('your column values') on duplicate key update gameid = gameid

It will be much easier and hassle free of storing 30k records in memory and then filtering them. Moreover it will limit your concerns to database only.Link For Reference-dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

edited Aug 31, 2019 at 19:09

answered Aug 31, 2019 at 16:59

Shubham Dixit

9,2685 gold badges32 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sébastien Renauld · Accepted Answer · 2019-08-31 14:39:45Z

0

Let's go over a few of the requirements and observable rules of your code from that one line of code:

You're receiving something that eventually maps into an array of structures as follows:
```
{
  "data": ["game_ids"]
}
```
You want to filter based on contents of another array, mysqldata.

Assuming you cannot change the return format of the API, you can leverage a property of your data to optimize at least sone of it.

Your API return is a list of independent objects. You can take this to your advantage, as you will only need to perform one operation on each to filter. If you can get your API call to return a Reader as opposed to a readily-parsed JSON object, you can take advantage of this by using a streaming json parser as opposed to JSON.parse.

This type of parser will return a stream of tokens, as it filters through your array, as opposed to doing the filtering then returning everything in one chunk.

This will not massively increase the performance, since the largest part of your code's clock time will be spent waiting for the network request to complete and the filtering itself (30k calls to Array.includes(), all of them impossible to avoid), so don't expect miracles.

The better way

The better way of doing it is to change the API endpoint, if at all possible, since that'll allow you to solve two problems in one go: fewer data transferred over the network, and fewer clock cycles spent filtering.

What you are effectively doing is a client-side (to the database, that is) WHERE gameid IN (...). If you are allowed to modify the API call, you should be taking this into account.

answered Aug 31, 2019 at 14:39

Sébastien Renauld

19.8k2 gold badges51 silver badges69 bronze badges

2 Comments

user10091301 Over a year ago

So I have to look into mysql way preferably.

Sébastien Renauld Over a year ago

If you want to avoid having to transfer a 30k-large object and doing the filtering, yes. If you cannot do that, however, look into the streaming parser to alleviate the parsing time a bit.

Collectives™ on Stack Overflow

How to handle large data in node js

2 Answers 2

Comments

The better way

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

The better way

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related