1

I programmed a C++ executable with Qt to calculate data based on a SQLite database and store it within the database again. With Qt I am executing a SELECT, calculating data and storing it again via UPDATE. Initially I tested this with chunks of about 5000 lines and it worked quite fast (ca. 1ms per line). Now that I implemented everything I need to apply the same calculation on databases with around 600000 lines. Now the average time per line is around the factor 200 slower. While improving my code, I first verified, that its about the table size and not about calculation specifics and I therefore truncated data from the long table to 100000 and that is only the factor of 20 slower than the small table with 5000 lines.

I already since the beginning have pragmas in the code to improve overall performance:

query.exec("PRAGMA page_size = 16384");
    query.exec("PRAGMA cache_size = 131072");
    query.exec("PRAGMA temp_store = MEMORY");
    query.exec("PRAGMA journal_mode = OFF");
    query.exec("PRAGMA locking_mode = EXCLUSIVE");
    query.exec("PRAGMA synchronous = OFF");

And I also tried to make use of transactions in bunches of 100 or 1000 lines, but that did not help:

database.transaction();
...
database.commit();

Can anybody suggest what to do? I was already thinking about splitting database files, but is there an easy way to do?

EDIT: As requested, here is a simplified minimal example. The table layout is more complex in my real example (17 columns) and has around 600000 entries, nevertheless this introduces the problem quite well:

widget.h

#ifndef WIDGET_H
#define WIDGET_H

#include <QtCore>
#include <QtGui>
#include <QtWidgets>
#include <QtSql>

class Widget : public QWidget
{
    Q_OBJECT

public:
    Widget(QWidget *parent = 0)
        : QWidget(parent)
    {
        QPushButton *createSmall = new QPushButton("Create Small");
        connect(createSmall, SIGNAL(clicked()), this, SLOT(createSmallDataBase()));
        QPushButton *createBig = new QPushButton("Create Big");
        connect(createBig, SIGNAL(clicked()), this, SLOT(createBigDataBase()));
        QPushButton *calculateSmall = new QPushButton("Calculate Small");
        connect(calculateSmall, SIGNAL(clicked()), this, SLOT(calculateSmallDataBase()));
        QPushButton *calculateBig = new QPushButton("Calculate Big");
        connect(calculateBig, SIGNAL(clicked()), this, SLOT(calculateBigDataBase()));
        QVBoxLayout *layout = new QVBoxLayout();
        layout->addWidget(createSmall);
        layout->addWidget(createBig);
        layout->addWidget(calculateSmall);
        layout->addWidget(calculateBig);
        this->setLayout(layout);
    }

    ~Widget()
    {
    }

    void createDataBase(quint32 size, QString name)
    {
        QSqlDatabase database;
        database = QSqlDatabase::addDatabase("QSQLITE");
        database.setDatabaseName(name);
        if(database.open())
        {
            QSqlQuery query(database);
            query.exec("PRAGMA page_size = 4096");
            query.exec("PRAGMA cache_size = 16384");
            query.exec("PRAGMA temp_store = MEMORY");
            query.exec("PRAGMA journal_mode = OFF");
            query.exec("PRAGMA locking_mode = EXCLUSIVE");
            query.exec("PRAGMA synchronous = OFF");
            qDebug() << "DROP" << query.exec("DROP TABLE Scenario");
            qDebug() << "CREATE" << query.exec("CREATE TABLE IF NOT EXISTS Scenario(id INTEGER, time REAL, prob REAL)");
            for(quint32 i = 0; i < size; i++)
            {
                query.exec(QString("INSERT INTO Scenario (id, time, prob) VALUES(%1, %2, %3)").arg(i).arg(i).arg(-1));
            }
        }
        database.close();
        database.removeDatabase("QSQLITE");
    }

    void calculateDataBase(QString name)
    {
        QSqlDatabase database;
        database = QSqlDatabase::addDatabase("QSQLITE");
        database.setDatabaseName(name);
        if(database.open())
        {
            QSqlQuery query(database);
            query.exec("PRAGMA page_size = 4096");
            query.exec("PRAGMA cache_size = 16384");
            query.exec("PRAGMA temp_store = MEMORY");
            query.exec("PRAGMA journal_mode = OFF");
            query.exec("PRAGMA locking_mode = EXCLUSIVE");
            query.exec("PRAGMA synchronous = OFF");

            query.exec("SELECT MAX(id) FROM Scenario");
            quint32 maxID = 0;

            if(query.next())
            {
                maxID = query.value(0).toUInt();
            }

            for(quint32 id = 0; id <= maxID; id++)
            {
                query.exec(QString("SELECT id, time, prob FROM Scenario WHERE id = %1").arg(QString::number(id)));
                if(query.first())
                {
                    double prob = query.value(0).toDouble();
                    query.exec(QString("UPDATE Scenario SET prob = %1 WHERE id = %2").arg(qSqrt(prob)).arg(QString::number(id)));
                }
            }
        }
        database.close();
        database.removeDatabase("QSQLITE");
    }

public slots:
    void createSmallDataBase()
    {
        QTime time;
        time.start();
        createDataBase(1000, "small.sqlite");
        qDebug() << "Create Small" << time.elapsed()/1000.0;
    }
    void createBigDataBase()
    {
        QTime time;
        time.start();
        createDataBase(10000, "big.sqlite");
        qDebug() << "Create Big" << time.elapsed()/10000.0;
    }
    void calculateSmallDataBase()
    {
        QTime time;
        time.start();
        calculateDataBase("small.sqlite");
        qDebug() << "Calculate Small" << time.elapsed()/1000.0;
    }
    void calculateBigDataBase()
    {
        QTime time;
        time.start();
        calculateDataBase("big.sqlite");
        qDebug() << "Calculate Big" << time.elapsed()/10000.0;
    }
};

#endif // WIDGET_H

main.cpp

#include <QApplication>

#include "widget.h"

int main(int argc, char *argv[])
{
    QApplication a(argc, argv);
    Widget w;
    w.show();

    return a.exec();
}

On my machine the difference between calling calculateSmallDataBase() and calculateBigDataBase() is from 0.518ms/line to 3.0417ms/line and this is only from 1000 to 10000 lines! So I am already reaching factor 6 in between these.

Looking forward to your suggestions.

7
  • 1
    1) Turn on compiler optimization. 2) Post a minimal reproducible example. Commented Apr 22, 2018 at 15:13
  • 1
    Show the database schema, and the queries that are slow. Commented Apr 22, 2018 at 15:40
  • Get a query plan from the database engine look at optimising it's access paths. See: sqlite.org/eqp.html Usual cause of DB slowdowns when the data size changes from small test-data sets to larger real data-sets is non-optimised access to the table data e.g. missing indexes; not using indexes because the query list the keys in the opposite order; too many index for optimal inserts etc. Commented Apr 22, 2018 at 16:23
  • Hi @Richard Critten, could you formulate that as an answer, because I just tried it at my data and it worked like rocket fuel! Commented Apr 22, 2018 at 16:37
  • @samcarter_is_at_topanswers.xyz Well for sure I can post them here as well, but why are you suggesting this here and not at my question? ;-) And why is beamer more a topic for StackOverflow, then TeXexchange? Commented Jul 4, 2020 at 17:46

1 Answer 1

2

Get a query plan from the database engine look at optimising it's access paths. See: sqlite.org/eqp.html Usual cause of DB slowdowns when the data size changes from small test-data sets to larger real data-sets is non-optimised access to the table data e.g. missing indexes; not using indexes because the query list the keys in the opposite order; too many index for optimal inserts etc.

Sign up to request clarification or add additional context in comments.

1 Comment

In this case specifically CREATE UNIQUE INDEX idx_id_scen ON Scenario(id) was the line that did the trick.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.