0

I have a mysql database which contains data from api ,suppose it has a field called gameid which is unique ,every time a new data comes from API ,I run a query and select all gameid ,then with Array.Filter() method in JS(Node js) filter out the data which is not present in database and is there in API and store the unique data in database

Something Like

let filtered_data=datafromapi.filter(data=>!mysqldata.includes(data.gameid))

But With almost 30k records this takes a lot of time .Any idea how to do such process with mysql and node js.

8
  • What is the original format of the data returned by the API? Commented Aug 31, 2019 at 14:22
  • json format obviously.That's why I am using filter,and the data from mysql is also in json Commented Aug 31, 2019 at 14:23
  • Thats really Mysqls job ... Commented Aug 31, 2019 at 14:23
  • 2
    I guess you want to upsert in a batch, but I never really used Mysql so I'm not sure if it's the way to go, thats how I'd do that in Mongodb Commented Aug 31, 2019 at 14:28
  • 1
    @ambianBeing that would be the preferrable way. It saves a ton of time due to less network data, along with more optimized paths at every step. The code is not fully visible so it's impossible to tell, but I'm ready to bet you can do the entire thing in a single SQL query (via INSERT WHERE NOT EXISTS). Commented Aug 31, 2019 at 14:41

2 Answers 2

0

As per the comment of Jonas ,the upsert is the most suitable process which can allow you to avoid inserting duplicate data or probably the error encountered for duplicate records. As an example-

You can have a query like ,

insert into table_name values('your column values') on duplicate key update gameid = gameid

It will be much easier and hassle free of storing 30k records in memory and then filtering them. Moreover it will limit your concerns to database only.Link For Reference-dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

Sign up to request clarification or add additional context in comments.

Comments

0

Let's go over a few of the requirements and observable rules of your code from that one line of code:

  • You're receiving something that eventually maps into an array of structures as follows:

    {
      "data": ["game_ids"]
    }
    
  • You want to filter based on contents of another array, mysqldata.

Assuming you cannot change the return format of the API, you can leverage a property of your data to optimize at least sone of it.

Your API return is a list of independent objects. You can take this to your advantage, as you will only need to perform one operation on each to filter. If you can get your API call to return a Reader as opposed to a readily-parsed JSON object, you can take advantage of this by using a streaming json parser as opposed to JSON.parse.

This type of parser will return a stream of tokens, as it filters through your array, as opposed to doing the filtering then returning everything in one chunk.

This will not massively increase the performance, since the largest part of your code's clock time will be spent waiting for the network request to complete and the filtering itself (30k calls to Array.includes(), all of them impossible to avoid), so don't expect miracles.

The better way

The better way of doing it is to change the API endpoint, if at all possible, since that'll allow you to solve two problems in one go: fewer data transferred over the network, and fewer clock cycles spent filtering.

What you are effectively doing is a client-side (to the database, that is) WHERE gameid IN (...). If you are allowed to modify the API call, you should be taking this into account.

2 Comments

So I have to look into mysql way preferably.
If you want to avoid having to transfer a 30k-large object and doing the filtering, yes. If you cannot do that, however, look into the streaming parser to alleviate the parsing time a bit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.