Using postgresql 9.4 we have a simple contacts table with (id text not null (as pk), blob json) to experiment with porting a couchdb crm database. We will eventually split out to more columns etc, and handle the data more idomatically for a rdbms, but that's besides the point for the time being.
There are approximately 100k rows.
I am aware that hardcore postgresql performance experts advise against using offset however I can accept a small performance penalty (happy with anything under 100msec)
SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10
As expected takes <10ms
SELECT blob->>'firstName' FROM couchcontacts LIMIT 10
Also takes < 10ms (presume 10 json decode ops on blob column here)
SELECT blob->>'firstName' FROM couchcontacts OFFSET 10000 LIMIT 10
Takes upwards of 10 seconds!! Noted inefficiencies of offset aside why is this presumably causing 10,010 json decode ops? As the projection has no side-effects I don't understand the reason this can't be fast?
Is this a limitation of json functionality being relatively new to postgres? and thus unable to determine ->> opereator isnt yielding side-effects?
Interesting rewriting the query to this bring it back under 10milliseconds
SELECT jsonblob->>'firstName' FROM couchdbcontacts WHERE id IN (SELECT id FROM couchcontacts OFFSET 10000 LIMIT 10)
Is there a way to ensure offset doesnt json decode the offsetted records? (i.e. don't execute the select projection)
"Limit (cost=1680.31..1681.99 rows=10 width=32) (actual time=12634.674..12634.842 rows=10 loops=1)"
" -> Seq Scan on couchcontacts (cost=0.00..17186.53 rows=102282 width=32) (actual time=0.088..12629.401 rows=10010 loops=1)"
"Planning time: 0.194 ms"
"Execution time: 12634.895 ms"
EXPLAIN ANALYZEplease? I'm not totally convinced by the explanation of the discrepancy. Did you profile /perf top/ etc to see if your hypothesized explanation fits observed behaviour? Though on second thoughts ... I think that if you request a result set with an offset, PostgreSQL should evaluate expressions in discarded rows unless it can prove they have no side-effects. So maybe it is evaluating the json expressions... and arguably it should be unless it can prove they can't abort the query with anERRORor change database state.'{}'), and slower when the field actually exists ('{"bar":0}'), and increasingly slower if you make the json larger. It's basically behaving as if it's unserializing the json for each row when the operator is used.