4

I have a large table in my MySQL database (about 10 million rows), and I need to get all this data into JSON format. For smaller tables, I would use the basic connection.query("SELECT * FROM TABLE, function(err, results) {}); syntax. However, I don't want to have to load the whole table into memory.

I noticed that the mysql module had the ability to "stream" rows (https://github.com/felixge/node-mysql/#streaming-query-rows), so I was wondering whetherthat still loads the entire table into memory and then just gives us each row one by one, or whether it actually only loads one row at a time, so the whole table is never stored in memory at once.

3 Answers 3

3

Load your data by chunks. Here some working example.

var mysql = require('mysql');
var settings = {};

settings.host = "localhost";
settings.user = "root";
settings.password = "root";
settings.database = "dbname";

var pool = mysql.createPool(settings);

var countQuery = "SELECT count(*) as total FROM tbl";

var chunkSize = 1000;

pool.getConnection(function(err, connection) {
    if (err) {
        connection.release();
        console.log("Error on getConnection:", err);
        return;
    }

    connection.query(countQuery, {}, function(err, result) {
        if (err) {
            connection.release();
            console.log("Error on getConnection:", err);
            return;
        }

        if (result && result[0]) {
            var totalRows = result[0]['total'];
            console.log("Total rows in db:", totalRows);
            var periods = Math.ceil(totalRows/chunkSize)
            console.log("Total periods", periods);

            var selectQuery = "SELECT * FROM tbl ORDER BY id DESC LIMIT ";

            for(var i = 0; i < periods; i++) {
                var offset = i*chunkSize;

                var runQuery = selectQuery + offset + "," + chunkSize;

                console.log(runQuery);

                connection.query(runQuery, {}, function (err, results) {
                    if (err) {
                        console.log("Error on runQuery:", err);
                        return;
                    }

                    console.log("Data:", results);
                });
            }

            connection.release();
        }
    });
});
Sign up to request clarification or add additional context in comments.

5 Comments

This method could potentially result in wrong data being transferred, since it is split into several different SQL statements that are queried at different times with no write lock being put onto the table.
You are right. The better way is to it with recursion or use some lobs. But the idea is the same.
Aye, I'd much like to transfer this problem to the designers of the database engine. I think they already have thought very long and hard at this problem. Because of this I'd want my application to stream the data from the db instead of downloading it as chunks. github.com/felixge/node-mysql/#streaming-query-rows as mentioned in the initial question seems to do just that. I tried it out on a 40miljon rows table and it worked good with low memory consumption on my node app.
@Lilleman I am building an node app that brings Oracle data to Mysql. I have a concern about hot to bring large amount of data ( ex.: 1000k ). My first approach were to save data as CSV, using Oracle SQLPlus command line to save an csv file, then using Mysql command line to import it. I am wondering if your stream approach would be better. May you give me a clue about that. What is the best way to transport big data to another table (Oracle => Mysql or Mysql => Mysql)?
@calebeaires I know hardly anything about Oracle, but moving data from MySQL to MySQL I'd use the shell tools instead. Something like: mysqldump -h source_host -u root -p --hex-blob source_db_name table_name | mysql -h target_host -u root -p target_db_name When importing from CSV you also can stream the CSV through node if you want to alter the data, I'm using the fast-csv module for that and it works great for me. Observe you have to pause the CSV stream and allow the database to catch up or you'll run out of memory since the CSV stream is a lot faster than the database writes.
0

What comes first in my mind is the dynamic pagination. I'm sure your familiar with offset and limits with mysql, with that, you can control your query.

  1. First query, get 1000 rows.
  2. If successful, add query again next 1000 rows.
  3. Do it recursively.

1 Comment

But as comments say against the other answer, make sure you write-lock the table or you potentially get unexpected results.
0

Too late to answer but for someone who needs an optimized less time-consuming solution in 2021

All the above solutions are good but have

  1. Time complexity O(n)
  2. Hign Storage complexity O(n) or Hign Memory usage problems sometimes app may crash because of too many requests

Solution: Maintain a synchronized JSON file when a user does CRUD operations to DB for example in a put request

app.put('/product/:id', (res,req)=>{
    // step 1 do update operation in db
    // step 2 do update operation in JSON file

    return res.send('OK 200')

})

So next time when a user request JSON user can instantly get the JSON file

Happy coding :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.