How to improve performance of bulk SELECT query using SQLite in .NET?

Ask Question

Asked 10 years, 1 month ago

Modified 10 years, 1 month ago

Viewed 2k times

I have a large (long) table of data stored in SQLite - potentially 5m+ entries. I am using the System.Data.SQLite package to execute my query and read the data into a bespoke in-memory collection structure in the regular ADO.net way.

CODE (F#)

use cnxn = new SQLiteConnection(@"Data Source=C:\Temp\test.db;Version=3;Read Only=True;")
cnxn.Open()

let data = ResizeArray<Data>()

let cmd = new SQLiteCommand(@"SELECT X, Y, Z, AAA FROM Data", cnxn)
let reader = cmd.ExecuteReader()

while reader.Read() do
    let d = {X = reader.GetInt32(0); Y = reader.GetInt32(1); 
                    Z = reader.GetInt32(2); AAA = reader.GetDouble(3)}
    data.Add(d)
cnxn.Close()

Questions

Is System.Data.SQLite the most performant library to be using for the job here? I am only using it because it appears to be the standard
Is their a better way to code this up?
Are there any settings/configurations on the database itself that would help this scenario?

Why do I think this should be able to go faster?

My computer has a theoretical read speed of 725 mb/s (SSD). Reading the sqlite above I am reading 40mb in 1s which is an effective actual speed of 40 mb/s.

Another surprising result from profiling shows that about 35% of the time is spent on reader.Read() [not surprising] and the remainder in GetInt32 and GetDouble() [very surprising].

edited Nov 5, 2015 at 23:45

ildjarn

63.2k9 gold badges135 silver badges219 bronze badges

asked Nov 5, 2015 at 8:23

Sam

2,8854 gold badges24 silver badges48 bronze badges

7

Why would you want to bring 5 million rows to memory all at once? Nothing can speed that up beyond a certain point. Might want to rethink that design.

Hanky Panky
– Hanky Panky

2015-11-05 08:24:46 +00:00
Commented Nov 5, 2015 at 8:24
1

@Hanky웃Panky Thanks, but I really do want bring them ALL into memory and keep them there for an indefinite period - with full knowledge of why that may not appear to be good design.

Sam
– Sam

2015-11-05 08:28:24 +00:00
Commented Nov 5, 2015 at 8:28
Measure your loop: check whether the Read() or the Add() takes the most time.

CL.
– CL.

2015-11-05 08:31:49 +00:00
Commented Nov 5, 2015 at 8:31
@CL. From profiling: most time is taken in Read() then in GetInt32() and then GetDouble()

Sam
– Sam

2015-11-05 08:43:18 +00:00
Commented Nov 5, 2015 at 8:43
2

@Hanky웃Panky Fewer columns in an index results in fewer leaf pages stored on disc. Fewer pages means less disc io to load the table into memory. If the desired columns are 1/10 of the columns in the clustered index, then having an index covering only those columns could be a 90% reduction in data loaded from disc into memory and thus 1/10 the query time.

N_A
– N_A

2015-11-06 00:44:16 +00:00
Commented Nov 6, 2015 at 0:44

| Show 11 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to improve performance of bulk SELECT query using SQLite in .NET?

CODE (F#)

Questions

Why do I think this should be able to go faster?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

CODE (F#)

Questions

Why do I think this should be able to go faster?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked