We have a postgres trigger that is set to run when eastings and northings values are inserted or updated on a row. The trigger function use postgis to transform the eastings and northings, which are floats, to a point value which is stored and cached so that we can perform geospatial queries.
On our CI tests we experience flakey test failures with respect to that trigger, we have not been able to recreate the issues locally. Sometimes when CI runs it will fail and sometimes it won't, and it won't be the same test that causes the failure. I am looking to see if anyone has experienced this problem before or have ideas for investigation.
Our local dev postgres is
SELECT version();
'PostgreSQL 15.8 (Debian 15.8-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit'
SELECT PostGIS_full_version();
'POSTGIS="3.4.2 c19ce56" [EXTENSION] PGSQL="150" GEOS="3.11.1-CAPI-1.17.1" PROJ="9.1.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj DATABASE_PATH=/usr/share/proj/proj.db" GDAL="GDAL 3.6.2, released 2023/01/02" LIBXML="2.9.14" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTER'
CI postgres
SELECT version();
'PostgreSQL 15.8 (Debian 15.8-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit',
SELECT PostGIS_full_version();
'POSTGIS="3.4.3 e365945" [EXTENSION] PGSQL="150" GEOS="3.11.1-CAPI-1.17.1" PROJ="9.1.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj DATABASE_PATH=/usr/share/proj/proj.db" LIBXML="2.9.14" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)" TOPOLOGY',
Trigger function which has modified to include some more verbose exception messages.
DECLARE
SRID INT;
LOCATION GEOMETRY(POINT, 4326);
v_state TEXT;
v_msg TEXT;
v_detail TEXT;
v_hint TEXT;
v_context TEXT;
BEGIN
SELECT project.utm_zone INTO SRID FROM projects_project project WHERE project.id = NEW.project_id;
LOCATION := (
CASE
WHEN
NEW.planned_location_point_x IS NOT NULL AND
NEW.planned_location_point_y IS NOT NULL
THEN
CASE
WHEN
SRID = 28348
THEN
ST_InverseTransformPipeline( ST_POINT( NEW.planned_location_point_x, NEW.planned_location_point_y, SRID),
'+proj=pipeline
+step +proj=unitconvert +xy_in=deg +xy_out=rad
+step +proj=utm +zone=48 +south +ellps=GRS80',
4326 )
ELSE
ST_Transform( ST_POINT( NEW.planned_location_point_x, NEW.planned_location_point_y, SRID ), 4326 )
END
ELSE
NULL
END
);
NEW.planned_location_point = LOCATION;
RETURN NEW;
EXCEPTION
WHEN OTHERS THEN
get stacked diagnostics
v_state = returned_sqlstate,
v_msg = message_text,
v_detail = pg_exception_detail,
v_hint = pg_exception_hint,
v_context = pg_exception_context;
raise log E'Got exception:
state : %
message: %
detail : %
hint : %
context: %', v_state, v_msg, v_detail, v_hint, v_context;
raise log E'Got exception:
SQLSTATE: %
SQLERRM: %', SQLSTATE, SQLERRM;
RAISE EXCEPTION 'SRID: %, X: %, Y: %, SQLERRM: %, SQLSTATE %', SRID, NEW.planned_location_point_x, NEW.planned_location_point_y, SQLERRM, SQLSTATE;
END;
Error messages when it fails on CI
Got exception:
state: XX000
message: transform: Unknown error (code 4096) (4096)
detail:
hint:
context: PL/pgSQL function pgtrigger_set_planned_location_point_gps_cd099() line 24 at assignment
psycopg2.errors.RaiseException: SRID: SRID: 20349, X: 485077.7489552063, Y: 6416413.62531117, SQLERRM: transform: Unknown error (code 4096)
CONTEXT: PL/pgSQL function pgtrigger_set_planned_location_point_gps_cd099() line 162 at RAISE (4096), SQLSTATE XX000
I have modified the trigger function to try to provide more context as seen in the above code but since it is a flakey test that only appears on CI it is really hard to figure out the issue.
Before adding the exception capture to the trigger, the error message was below, which means the error was on assignment to LOCATION. Following the code path it would suggest the error is cause by ST_Transform.
psycopg2.errors.InternalError_: transform: Unknown error (code 4096) (4096)
CONTEXT: PL/pgSQL function pgtrigger_set_planned_location_point_gps_cd099() line 19 at assignment
But running the same function on dev that would be called doesn't raise an error:
SELECT ST_Transform( ST_POINT( 485077.7489552063, 6416413.62531117, 20349 ), 4326 );
st_transformcould be (have been, this particular bug is fixed) impacted by a prior malformed proj string. So maybe you want to record the parameters of ALL tests that you run, and if you are running tests in parallel it could be an unrelated test playing with the proj string that makes this one fail.