0

We have a postgres trigger that is set to run when eastings and northings values are inserted or updated on a row. The trigger function use postgis to transform the eastings and northings, which are floats, to a point value which is stored and cached so that we can perform geospatial queries.

On our CI tests we experience flakey test failures with respect to that trigger, we have not been able to recreate the issues locally. Sometimes when CI runs it will fail and sometimes it won't, and it won't be the same test that causes the failure. I am looking to see if anyone has experienced this problem before or have ideas for investigation.

Our local dev postgres is

SELECT version();
'PostgreSQL 15.8 (Debian 15.8-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit'

SELECT PostGIS_full_version();
'POSTGIS="3.4.2 c19ce56" [EXTENSION] PGSQL="150" GEOS="3.11.1-CAPI-1.17.1" PROJ="9.1.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj DATABASE_PATH=/usr/share/proj/proj.db" GDAL="GDAL 3.6.2, released 2023/01/02" LIBXML="2.9.14" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTER'

CI postgres

SELECT version();
'PostgreSQL 15.8 (Debian 15.8-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit',

SELECT PostGIS_full_version();
'POSTGIS="3.4.3 e365945" [EXTENSION] PGSQL="150" GEOS="3.11.1-CAPI-1.17.1" PROJ="9.1.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj DATABASE_PATH=/usr/share/proj/proj.db" LIBXML="2.9.14" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)" TOPOLOGY',

Trigger function which has modified to include some more verbose exception messages.

DECLARE
SRID INT;
LOCATION GEOMETRY(POINT, 4326);
v_state   TEXT;
v_msg     TEXT;
v_detail  TEXT;
v_hint    TEXT;
v_context TEXT;

BEGIN
    SELECT project.utm_zone INTO SRID FROM projects_project project WHERE project.id = NEW.project_id;
    
    LOCATION := (
        CASE
            WHEN
                NEW.planned_location_point_x IS NOT NULL AND
                NEW.planned_location_point_y IS NOT NULL
            THEN
                CASE
                    WHEN
                        SRID = 28348
                    THEN
                        ST_InverseTransformPipeline( ST_POINT( NEW.planned_location_point_x, NEW.planned_location_point_y, SRID),
                        '+proj=pipeline
                        +step +proj=unitconvert +xy_in=deg +xy_out=rad
                        +step +proj=utm +zone=48 +south +ellps=GRS80',
                        4326 )
                ELSE
                    ST_Transform( ST_POINT( NEW.planned_location_point_x, NEW.planned_location_point_y, SRID ), 4326 )
                END
            ELSE
                NULL
        END
    );
    
    NEW.planned_location_point = LOCATION;
    
    RETURN NEW;
    
EXCEPTION
    WHEN OTHERS THEN
        get stacked diagnostics
            v_state   = returned_sqlstate,
            v_msg     = message_text,
            v_detail  = pg_exception_detail,
            v_hint    = pg_exception_hint,
            v_context = pg_exception_context;

        raise log E'Got exception:
            state  : %
            message: %
            detail : %
            hint   : %
            context: %', v_state, v_msg, v_detail, v_hint, v_context;

        raise log E'Got exception:
            SQLSTATE: %
            SQLERRM: %', SQLSTATE, SQLERRM;

        RAISE EXCEPTION 'SRID: %, X: %, Y: %, SQLERRM: %, SQLSTATE %', SRID, NEW.planned_location_point_x, NEW.planned_location_point_y, SQLERRM, SQLSTATE;
END;

Error messages when it fails on CI

Got exception:

state: XX000
message: transform: Unknown error (code 4096) (4096)
detail:
hint:
context: PL/pgSQL function pgtrigger_set_planned_location_point_gps_cd099() line 24 at assignment

psycopg2.errors.RaiseException: SRID: SRID: 20349, X: 485077.7489552063, Y: 6416413.62531117, SQLERRM: transform: Unknown error (code 4096)
CONTEXT: PL/pgSQL function pgtrigger_set_planned_location_point_gps_cd099() line 162 at RAISE (4096), SQLSTATE XX000

I have modified the trigger function to try to provide more context as seen in the above code but since it is a flakey test that only appears on CI it is really hard to figure out the issue.

Before adding the exception capture to the trigger, the error message was below, which means the error was on assignment to LOCATION. Following the code path it would suggest the error is cause by ST_Transform.

psycopg2.errors.InternalError_: transform: Unknown error (code 4096) (4096)
CONTEXT:  PL/pgSQL function pgtrigger_set_planned_location_point_gps_cd099() line 19 at assignment

But running the same function on dev that would be called doesn't raise an error:

SELECT ST_Transform( ST_POINT( 485077.7489552063, 6416413.62531117, 20349 ), 4326 );
2
  • See this bug report, which demonstrates that a call to st_transform could be (have been, this particular bug is fixed) impacted by a prior malformed proj string. So maybe you want to record the parameters of ALL tests that you run, and if you are running tests in parallel it could be an unrelated test playing with the proj string that makes this one fail. Commented Oct 18, 2024 at 13:00
  • @JGH I did see that bug report as well and tried to recreate it but looks like out version has already been patched. I will try to add more logging but the challenge is that it hard to isolate (even just running the few tests before the failing test in order doesn't recreate the issue) and the full test suite takes ~40 minutes to run. Hoping to hear from someone else that might have already encountered and solved it. Commented Oct 21, 2024 at 1:43

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.