manual execution of specific remote query with postgresql_fdw

Question

In 9.4b2, postgresql_fdw doesn't know how to "push down" aggregate queries on remote tables, e.g.

> explain verbose select max(col1) from remote_tables.table1;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Aggregate  (cost=605587.30..605587.31 rows=1 width=4)
   Output: max(col1)
   ->  Foreign Scan on remote_tables.table1  (cost=100.00..565653.20 rows=15973640 width=4)
         Output: col1, col2, col3
         Remote SQL: SELECT col1 FROM public.table1

It would obviously be much more efficient to send SELECT max(col1) FROM public.table1 to the remote server and just pull the one row back.

Is there a way to perform this optimization manually? I would be satisfied with something as low-level as (hypothetically speaking)

EXECUTE 'SELECT max(col1) FROM public.table1' ON remote RETURNING (col1 INTEGER);

although of course a higher-level construct would be preferred.

I'm aware that I could do something like this with dblink, but that would involve rewriting a large body of code that already uses foreign tables, so I'd prefer not to.

EDIT: Here's the query plan for Erwin Brandstetter's suggestion:

=> explain verbose select col1 from remote_tables.table1 
-> order by col1 desc nulls last limit 1;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Limit  (cost=645521.40..645521.40 rows=1 width=4)
   Output: url
   ->  Sort  (cost=645521.40..685455.50 rows=15973640 width=4)
         Output: col1
         Sort Key: table1.col1
         ->  Foreign Scan on remote_tables.table1  (cost=100.00..565653.20 rows=15973640 width=4)
               Output: col1
               Remote SQL: SELECT col1 FROM public.table1

This is better, in that it fetches only col1, but it's still dragging 16 million rows over the network and now it's also sorting them. By way of comparison, the original query, applied on the remote server, doesn't even have to scan, because that column has an index. (The core query planner isn't clever enough to do that for the modified query applied on the remote server, but that's minor.)

zwol · Accepted Answer · 2014-10-04 21:50:05Z

For the time being, it seems that the best available option is to create a view on the remote server that encapsulates the query needing to be "pushed down". postgres_fdw is happy to define and use foreign tables backed by views on the remote, and regular old query optimization within the view does the Right Thing. For instance, given

CREATE VIEW id_ranges AS
SELECT 'url_strings'::text AS tbl,
    min(url_strings.id)::bigint AS lo,
    max(url_strings.id)::bigint AS hi
   FROM url_strings
UNION
 SELECT 'captured_pages'::text AS tbl,
    min(captured_pages.url)::bigint AS lo,
    max(captured_pages.url)::bigint AS hi
   FROM captured_pages
UNION
 -- ... several more like that ...

on the remote, and a FOREIGN TABLE of the same name on the local server,

SELECT lo, hi FROM id_ranges WHERE tbl = 'url_strings';

the existing pushdown optimization will send the WHERE constraint to the remote, and the remote will scan only one table (making use of indexes if possible) and send back a single-row result.

Erwin Brandstetter · Accepted Answer · 2014-09-23 00:48:09Z

0

Remote Query Optimization is rather basic:

postgres_fdw attempts to optimize remote queries to reduce the amount of data transferred from foreign servers. This is done by sending query WHERE clauses to the remote server for execution, and by not retrieving table columns that are not needed for the current query. [...]

My first idea to substitute with the following isn't much of an improvement either as you found out:

SELECT col1
FROM   public.table1
ORDER  BY col1 DESC NULLS LAST
LIMIT  1;

Currently (including pg 9.4), only WHERE conditions with all immutable functions are pushed down. I found this exhaustive thread discussing the Status of FDW pushdowns on pgsql-hackers.

Your best option seems to use dblink like you already mentioned yourself.

edited Sep 23, 2014 at 0:48

answered Sep 22, 2014 at 23:31

Erwin Brandstetter

187k28 gold badges466 silver badges642 bronze badges

Alas, the sort happens locally.

zwol
– zwol

2014-09-22 23:45:52 +00:00
Commented Sep 22, 2014 at 23:45
@Zack: Too bad. Currently, really only WHERE conditions with all immutable functions are pushed down.

Erwin Brandstetter
– Erwin Brandstetter

2014-09-23 00:29:06 +00:00
Commented Sep 23, 2014 at 0:29

Add a comment |

Stack Exchange Network

manual execution of specific remote query with postgresql_fdw

2 Answers 2

Your Answer

Hot Network Questions

manual execution of specific remote query with postgresql_fdw

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions