I have these three tables:
- create table words (id integer, word text, freq integer);
- create table sentences (id integer, sentence text);
- create table index (wordId integer, sentenceId integer, position integer);
Index is a inverted index and denotes which word occurs in which sentence. Furthermoore I have an index on id from the table words and sentences.
This query determines in which sentences a given word occurs and returns the first match:
select S.sentence from sentences S, words W, index I
where W.word = '#erhoehungen' and W.id = I.wordId and S.id = I.sentenceId
limit 1;
But when I want to retrieve a sentence where two words occur together like:
select S.sentence from sentences S, words W, index I
where W.word = '#dreikampf' and I.wordId = W.id and S.id = I.sentenceId and
S.id in (
select S.id from sentences S, words W, index I
where W.word = 'bruederle' and W.id = I.wordId and S.id = I.sentenceId
)
limit 1;
This query is much slower. Is there any trick to speed it up? Following things I did so far:
- increased shared_buffer to 32MB
- increased work_mem to 15MB
- ran analyze on all tables
- as mentioned created index on words id and sentences id
Regards.
€Dit:
Here is the output of the explain analyze query statement: http://pastebin.com/t2M5w4na
These three create statements are actually my original create statements. Should I add primary key to the tables sentences and words and reference these as foreign keys in the index? But what primary key should I use for the index table? SentId and wordId together are not unique and even if I add pos which denotes the position of the word in the sentence it is not unique.
updated to:
- create table words (id integer, word text, freq integer, primary key(id));
- create table sentences (id integer, sentence text, primary key(id));
- create table index (wordId integer, sentenceId integer, position integer, foreign key(wordId) references words(id), foreign key(sentenceId) references sentences(sentenceId));
explain analyze your_query, where "your_query" represents your troublesome SELECT statement. Also, actual CREATE TABLE statements can help a lot.index(terrible name, BTW) needs at least a primary key.{sentenceid, position}is the obvious choice. Having one or two compound indexes on{sentenceid,wordid}and/or{wordid,sentenceid}would probably help, too.worditself. off-record: RDBMS and nlp are a bad match. You could take a look at other storage methods (for Postgres: hstore, or GIST indexes for full-text search)