In 9.4b2, postgresql_fdw doesn't know how to "push down" aggregate queries on remote tables, e.g.
> explain verbose select max(col1) from remote_tables.table1;
QUERY PLAN
---------------------------------------------------------------------------------------------
Aggregate (cost=605587.30..605587.31 rows=1 width=4)
Output: max(col1)
-> Foreign Scan on remote_tables.table1 (cost=100.00..565653.20 rows=15973640 width=4)
Output: col1, col2, col3
Remote SQL: SELECT col1 FROM public.table1
It would obviously be much more efficient to send SELECT max(col1) FROM public.table1 to the remote server and just pull the one row back.
Is there a way to perform this optimization manually? I would be satisfied with something as low-level as (hypothetically speaking)
EXECUTE 'SELECT max(col1) FROM public.table1' ON remote RETURNING (col1 INTEGER);
although of course a higher-level construct would be preferred.
I'm aware that I could do something like this with dblink, but that would involve rewriting a large body of code that already uses foreign tables, so I'd prefer not to.
EDIT: Here's the query plan for Erwin Brandstetter's suggestion:
=> explain verbose select col1 from remote_tables.table1
-> order by col1 desc nulls last limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=645521.40..645521.40 rows=1 width=4)
Output: url
-> Sort (cost=645521.40..685455.50 rows=15973640 width=4)
Output: col1
Sort Key: table1.col1
-> Foreign Scan on remote_tables.table1 (cost=100.00..565653.20 rows=15973640 width=4)
Output: col1
Remote SQL: SELECT col1 FROM public.table1
This is better, in that it fetches only col1, but it's still dragging 16 million rows over the network and now it's also sorting them. By way of comparison, the original query, applied on the remote server, doesn't even have to scan, because that column has an index. (The core query planner isn't clever enough to do that for the modified query applied on the remote server, but that's minor.)