0

I've searched a lot and this is deprecated question.

I'm trying to bulk insert in a table.

My approach was like this

knex('test_table').where({
  user: '[email protected]',
})
.then(result => {
  knex.transaction(trx => {
    Bluebird.map(result, data => {
      return trx('main_table')
        .insert(data.insert_row)
    }, { concurrency: 3 })
    .then(trx.commit);
  })
  .then(() => {
    console.log("done bulk insert")
  })
  .catch(err => console.error('bulk insert error: ', err))
})

this could work if the columns where text or numeric columns, but i have jsonb columns

But I got this error:

invalid input syntax for type json

How can I solve this problem?

1 Answer 1

1

Sounds like some json columns doesn't have data stringified when sent to DB.

Also that is pretty much the slowest way to insert multiple rows, because you are doing 1 query for each inserted row and using single connection for inserting.

That concurrency 3 only causes pg driver to buffer those 2 queries before they are sent to the DB through the same transaction that all the others.

Something like this should be pretty efficient (didn't test running the code, so there might be errors):

const rows = await knex('test_table').where({ user: '[email protected]' });
rows.forEach(row => {
  // make sure that json columns are actually json strings
  row.someColumnWithJson = JSON.stringify(row.someColumnWithJson);
});

await knex.transaction(async trx => {
  let i, j, temparray, chunk = 200;

  // insert rows in 200 row batches
  for (i = 0, j = rows.length; i < j; i += chunk) {
    rowsToInsert = rows.slice(i, i + chunk);
    await trx('main_table').insert(rowsToInsert);
  }
});

Also knex.batchInsert might work for you.

Sign up to request clarification or add additional context in comments.

2 Comments

can you give me an example for knex.bathInsert and How is your approach different than mine. and is there performance issues with both approaches as you map then commit what if it was 10,000 rows to insert ?
Performance for 10k rows should be fine as long as you don't insert them one by one. In this example those rows are inserted in 200 row batches. Also inserting them in transaction should be also most performant solution overall (using less DB cpu / io resources), since all writes to that table are done inside same transaction, which prevents problems with locking etc. that may happen if multiple connections are adding those rows concurrently. Example for batch insert is in docs knexjs.org/#Utility-BatchInsert

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.