I have a DuckDB table with an int32 column type and a custom (Python) function type_str that converts the enum value to a human-readable string.
This query is very fast:
select type_str(type) as name, type, count(*) as count from objects
group by type
having count > 1000
order by count desc;
which means the type_str function is not called for every row.
However, this query is very slow:
select type_str(type) as name, type, count(*) as count from objects
group by type
having count > 1000 and name[0:3] = 'CAN'
order by count desc;
The documentation of HAVING says
The HAVING clause can be used after the GROUP BY clause to provide filter criteria after the grouping has been completed.
So I don't understand why this second query is much slower. It shouldn't have to do more work.
EXPLAINsuggests that it is running the UDF on all rows first:FILTER (array_slice(type_str(..)) -> FILTER (countThere seem to be a couple of similar issues on Github github.com/duckdb/duckdb/issues?q=udf+filter - but not this specific case. You may need to ask DuckDB directly if you don't get a reply here.