I have a table with a few hundred thousand lines. (It’s a precomputed table expressing the relation between lemmas of words and other big tables.) I need to do multiple selections to find a combination of different entries, i.e. I have to use “AS” to do select … from ltc as l0, ltc as l1, ltc as l2 … order by ... The speed of the query depends on the sorting: Without sorting, it’s a few milliseconds, with sorting, it can take a few minutes. This is due, as far as I can tell, to the temporary B-Tree that Sqlite builds for sorting, even though I have an index on the sorted column “nr”. I don’t understand why Sqlite does not use this index.
CREATE TABLE ltc
(nr INTEGER, lemId INTEGER, cId INTEGER, bId INTEGER,
-- UNIQUE (lemId, cId, bId),
-- if I add this uniqueness constraint, strangely enough it doesn’t use my index at all, even at a simple ORDER BY.
PRIMARY KEY(nr,lemId,cId),
FOREIGN KEY(lemId) REFERENCES lems(rowid),
FOREIGN KEY(cId) REFERENCES cs(rowid),
FOREIGN KEY(bId) REFERENCES bs(rowid) )
CREATE INDEX nri ON ltc(nr)
Here is the stripped down version of my select command:
SELECT l0.nr,l1.nr,l2.nr
FROM ltc as l0, ltc as l1, ltc as l2
WHERE
l0.lemId IN (1001) -- in reality 1001 is some simple sub select.
AND l1.lemId IN (1002,1003)
AND l2.lemId IN (1004 )
ORDER BY
l0.nr,
l1.nr,
l2.nr
LIMIT 10;
the EXPLAIN QUERY PLAN gives:
(0, 0, 0, u'SCAN TABLE ltc AS l0')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 1')
(1, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 1, 1, u'SCAN TABLE ltc AS l1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 2')
(2, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 2, 2, u'SCAN TABLE ltc AS l2')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 3')
(3, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')
and this with the ORDER BY removed (or reduced to only one column order by
l0.nr):
(0, 0, 0, u'SCAN TABLE ltc AS l0 USING COVERING INDEX sqlite_autoindex_ltc_1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 1')
(1, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 1, 1, u'SCAN TABLE ltc AS l1 USING COVERING INDEX sqlite_autoindex_ltc_1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 2')
(2, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 2, 2, u'SCAN TABLE ltc AS l2 USING COVERING INDEX sqlite_autoindex_ltc_1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 3')
(3, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
I have tried all sort of single and combined indeces, but it doesn’t seem to make any difference.
The problem seems to be the double ordering itself not the double selection: Even a useless double ORDER BY creates a temp b-tree (even though in this case the result is immediate):
EXPLAIN QUERY PLAN SELECT ltc.nr
FROM ltc
WHERE
ltc.lemId = 716
ORDER BY
ltc.nr,
ltc.nr
LIMIT 10;
(0, 0, 0, u'SCAN TABLE ltc')
(0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')
At SQLite ORDER BY performance issue it is said that queries cannot be ordered by indeces from different tables. Is this the problem here? Is there a way around? Is this a Sqlite specific restriction or do all SQL systems do this?
EDIT:
After adding the index, as suggested by CL, the performance problem remains. As an example take a more complete query with four search terms:
select l0.nr,l1.nr,l2.nr,l3.nr
from ltc as l0, ltc as l1, ltc as l2, ltc as l3
where
l0.lemId in (select rowid from lems where lems.lem = "catch" )
and l1.lemId in (select rowid from lems where lems.lem = "cause" )
and l2.lemId in (select rowid from lems where lems.lem = "score" )
and l3.lemId in (select rowid from lems where lems.lem = "guest" )
order by
l0.nr asc
LIMIT 10;
gives this explanation:
(0, 0, 0, u'SEARCH TABLE ltc AS l0 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 1')
(1, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 1, 1, u'SEARCH TABLE ltc AS l1 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 2')
(2, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 2, 2, u'SEARCH TABLE ltc AS l2 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 3')
(3, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 3, 3, u'SEARCH TABLE ltc AS l3 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 4')
(4, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')
(no more complete scans.)
but: time: 388 seconds!!!
when removing the order by, I get exactly the same explanation minus the last temp B-tree!
time: 0.00025 seconds!!!
This query corresponds to some kind of join. I can also represent the query as an (inner) join (with conditions). This may be the reason that the time seems to grow exponentially with the number of search terms: {1 search term: 0.08 seconds, 2: 0.5, 3: 3, 4: 9, 5: 116, ...} But somehow, I don’t quite understand why the database can’t simply use the index on the nr column to sort. After all, it’s just a lot of results, each containing nr, that have to be ordered.
As suggested by CL, I've put the underlying problem in a new question: Selecting tuples of lines from an Sqlite table and sorting the tuples efficiently