Using the mysql module in node.js for large tables

Question

I have a large table in my MySQL database (about 10 million rows), and I need to get all this data into JSON format. For smaller tables, I would use the basic connection.query("SELECT * FROM TABLE, function(err, results) {}); syntax. However, I don't want to have to load the whole table into memory.

I noticed that the mysql module had the ability to "stream" rows (https://github.com/felixge/node-mysql/#streaming-query-rows), so I was wondering whetherthat still loads the entire table into memory and then just gives us each row one by one, or whether it actually only loads one row at a time, so the whole table is never stored in memory at once.

Alexander R. · Accepted Answer · 2015-06-26 03:23:25Z

3

Load your data by chunks. Here some working example.

var mysql = require('mysql');
var settings = {};

settings.host = "localhost";
settings.user = "root";
settings.password = "root";
settings.database = "dbname";

var pool = mysql.createPool(settings);

var countQuery = "SELECT count(*) as total FROM tbl";

var chunkSize = 1000;

pool.getConnection(function(err, connection) {
    if (err) {
        connection.release();
        console.log("Error on getConnection:", err);
        return;
    }

    connection.query(countQuery, {}, function(err, result) {
        if (err) {
            connection.release();
            console.log("Error on getConnection:", err);
            return;
        }

        if (result && result[0]) {
            var totalRows = result[0]['total'];
            console.log("Total rows in db:", totalRows);
            var periods = Math.ceil(totalRows/chunkSize)
            console.log("Total periods", periods);

            var selectQuery = "SELECT * FROM tbl ORDER BY id DESC LIMIT ";

            for(var i = 0; i < periods; i++) {
                var offset = i*chunkSize;

                var runQuery = selectQuery + offset + "," + chunkSize;

                console.log(runQuery);

                connection.query(runQuery, {}, function (err, results) {
                    if (err) {
                        console.log("Error on runQuery:", err);
                        return;
                    }

                    console.log("Data:", results);
                });
            }

            connection.release();
        }
    });
});

answered Jun 26, 2015 at 3:23

Alexander R.

1,75612 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Lilleman Over a year ago

This method could potentially result in wrong data being transferred, since it is split into several different SQL statements that are queried at different times with no write lock being put onto the table.

Alexander R. Over a year ago

You are right. The better way is to it with recursion or use some lobs. But the idea is the same.

Lilleman Over a year ago

Aye, I'd much like to transfer this problem to the designers of the database engine. I think they already have thought very long and hard at this problem. Because of this I'd want my application to stream the data from the db instead of downloading it as chunks. github.com/felixge/node-mysql/#streaming-query-rows as mentioned in the initial question seems to do just that. I tried it out on a 40miljon rows table and it worked good with low memory consumption on my node app.

calebeaires Over a year ago

@Lilleman I am building an node app that brings Oracle data to Mysql. I have a concern about hot to bring large amount of data ( ex.: 1000k ). My first approach were to save data as CSV, using Oracle SQLPlus command line to save an csv file, then using Mysql command line to import it. I am wondering if your stream approach would be better. May you give me a clue about that. What is the best way to transport big data to another table (Oracle => Mysql or Mysql => Mysql)?

Lilleman Over a year ago

@calebeaires I know hardly anything about Oracle, but moving data from MySQL to MySQL I'd use the shell tools instead. Something like:

mysqldump -h source_host -u root -p --hex-blob source_db_name table_name | mysql -h target_host -u root -p target_db_name

When importing from CSV you also can stream the CSV through node if you want to alter the data, I'm using the fast-csv module for that and it works great for me. Observe you have to pause the CSV stream and allow the database to catch up or you'll run out of memory since the CSV stream is a lot faster than the database writes.

Jerome Miranda · Accepted Answer · 2015-06-26 01:08:28Z

0

What comes first in my mind is the dynamic pagination. I'm sure your familiar with offset and limits with mysql, with that, you can control your query.

First query, get 1000 rows.
If successful, add query again next 1000 rows.
Do it recursively.

answered Jun 26, 2015 at 1:08

Jerome Miranda

11 bronze badge

1 Comment

Lee Goddard Over a year ago

But as comments say against the other answer, make sure you write-lock the table or you potentially get unexpected results.

Divek John · Accepted Answer · 2021-01-13 09:06:33Z

0

Too late to answer but for someone who needs an optimized less time-consuming solution in 2021

All the above solutions are good but have

Time complexity O(n)
Hign Storage complexity O(n) or Hign Memory usage problems sometimes app may crash because of too many requests

Solution: Maintain a synchronized JSON file when a user does CRUD operations to DB for example in a put request

app.put('/product/:id', (res,req)=>{
    // step 1 do update operation in db
    // step 2 do update operation in JSON file

    return res.send('OK 200')

})

So next time when a user request JSON user can instantly get the JSON file

Happy coding :)

answered Jan 13, 2021 at 9:06

Divek John

7558 silver badges17 bronze badges

Collectives™ on Stack Overflow

Using the mysql module in node.js for large tables

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related