8

I have a 100 megabytes SQLite database file that I would like to load to memory before performing SQL queries. Is it possible to do that in Python?

2
  • 3
    That's what happens before too long -- it all winds up in memory. The only way to have an "all-in-memory" database is to open a database named ":memory:" create and load the tables from external sources. What problem are you trying to solve? Is it too slow? How do you know it's the database and not your code? Commented Sep 29, 2010 at 23:16
  • How do I load the tables from an external db to a memory db? Commented Sep 30, 2010 at 21:59

4 Answers 4

12

apsw is an alternate wrapper for SQLite, which enables you to backup an on-disk database to memory before doing operations.

From the documentation:

###
### Backup to memory
###

# We will copy the disk database into a memory database

memcon=apsw.Connection(":memory:")

# Copy into memory
with memcon.backup("main", connection, "main") as backup:
    backup.step() # copy whole database in one go

# There will be no disk accesses for this query
for row in memcon.cursor().execute("select * from s"):
    pass

connection above is your on-disk database.

Sign up to request clarification or add additional context in comments.

3 Comments

I like your solution but there is only one problem, I use a lot of row_factory feature of pysqlite; and it seems that apsw does not have this feature.
This has really solved my problem. My queries are MUCH faster now.
import apsw mem_db_loader=apsw.Connection(file_sqlite_db) connection=apsw.Connection(":memory:") connection.backup("main", mem_db_loader, "main").step() cursor = connection.cursor()
3
  1. Get an in-memory database running (standard stuff)
  2. Attach the disk database (file).
  3. Recreate tables / indexes and copy over contents.
  4. Detach the disk database (file)

Here's an example (taken from here) in Tcl (could be useful for getting the general idea along):

proc loadDB {dbhandle filename} {

    if {$filename != ""} {
        #attach persistent DB to target DB
        $dbhandle eval "ATTACH DATABASE '$filename' AS loadfrom"
        #copy each table to the target DB
        foreach {tablename} [$dbhandle eval "SELECT name FROM loadfrom.sqlite_master WHERE type = 'table'"] {
            $dbhandle eval "CREATE TABLE '$tablename' AS SELECT * FROM loadfrom.'$tablename'"
        }
        #create indizes in loaded table
        foreach {sql_exp} [$dbhandle eval "SELECT sql FROM loadfrom.sqlite_master WHERE type = 'index'"] {
            $dbhandle eval $sql_exp
        }
        #detach the source DB
        $dbhandle eval {DETACH loadfrom}
    }
}

Comments

3

If you are using Linux, you can try tmpfs which is a memory-based file system.

It's very easy to use it:

  1. mount tmpfs to a directory.
  2. copy the SQLite database file to the directory.
  3. open it as a normal SQLite database file.

Remember, anything in tmpfs will be lost after reboot. So, you may copy the database file back to disk if it changed.

Comments

1

Note that you may not need to explicitly load the database into SQLite's memory at all. Simply prime your operating system disk cache by copying it to null.

Windows: copy file.db nul:
Unix/Mac:  cp file.db /dev/null

This has the advantage of the operating system taking care of memory management, especially discarding it if something more important comes along.

4 Comments

It may be only my computer, but this technique didn't really improve my performance. (Win 7 x64, 8gb ram).
It has worked for many other people on the SQLite mailing list in the past especially after a machine has just booted as it primes the file system cache. In your case it is most likely that file didn't end up in the file system cache. (Some copy tools tell the OS to bypass the cache so that they don't throw out existing "good" content in it.)
The "nul:" trick didn't work for me on Win7, but a real copy (to temp.db) does. It's a little annoying b/c I have to delete the temp file to prevent taking excessive space on the HD, but it gets the file into the disk cache (makes the 1st query just as fast as subsequent queries).
when anyway in a programming language (Python), you could just dummy-read the whole file before doing work. Any experience how this cache priming performs vs the :memory: backup method?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.