Postgresql error statement is too large

Question

I developed a script on python and sqlalchemy to get and update the last activity of my active users.

But the users are increasing a lot, now i´m getting the following error

psycopg2.ProgrammingError: Statement is too large. Statement Size: 16840277 bytes. Maximum Allowed: 16777216 bytes

I was thinking if I update the file postgres.conf it will work, so with the help of pgtune I updated the file, but it does not work, so I updated my kernel on /etc/syslog.conf, with the following parameters

kern.sysv.shmmax=4194304
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.sysv.shmall=1024

and again it does not work.

After that I divide my query into slices to reduce the size but I got the same error.

How can know what parameter I need to update, to increase the size of my statement?

Workflow

query = "SELECT id FROM {}.{} WHERE status=TRUE".format(schema, customer_table)
ids = ["{}".format(i)for i in pd.read_sql(query, insert_uri).id.tolist()]

read_query = """
SELECT id,
 MAX(CONVERT_TIMEZONE('America/Mexico_City', last_activity)) lastactivity
FROM activity WHERE
DATE_TRUNC('d', CONVERT_TIMEZONE('America/Mexico_City', last_activity)) =
DATE_TRUNC('d', CONVERT_TIMEZONE('America/Mexico_City', CURRENT_DATE))-{} and
 id in ({})
GROUP BY id
""".format(day, ",".join(ids))

last_activity = pd.read_sql(read_query, read_engine, parse_dates=True)

Indeed, I've thought this limit is like 167772 % larger than the maximum statement that we'd conceivably need in our application that does some really badass analytics. Where's the SQLAlchemy code? — Antti Haapala
– Antti Haapala, Commented Mar 12, 2016 at 7:51
Yes, this workflow reduced considerable the time of my process, it is not normal? I have ~800k users, but some users are inactive, so first I have to determine what users are active, and after I compute only the users active with this workflow I can reduced the time. — paridin
– paridin, Commented Mar 12, 2016 at 7:59
@AnttiHaapala I updated my workflow, I'm using pandas to read the database and return a DataFrame to transform other stuff. — paridin
– paridin, Commented Mar 12, 2016 at 8:13
You must not format your parameters into the query; you must use placeholders instead. — Antti Haapala
– Antti Haapala, Commented Mar 12, 2016 at 8:17

hruske · Accepted Answer · 2016-03-12 08:44:21Z

3

If you are only fetching the IDs from the database and not filtering them by any other way, there is no need to fetch them at all, you can just insert the SQL statement as a subquery into the second:

SELECT id,
 MAX(CONVERT_TIMEZONE('America/Mexico_City', last_activity)) lastactivity
FROM activity WHERE
 DATE_TRUNC('d', CONVERT_TIMEZONE('America/Mexico_City', last_activity)) =
 DATE_TRUNC('d', CONVERT_TIMEZONE('America/Mexico_City', CURRENT_DATE))-%s and
 id in (
    SELECT id FROM customerschema.customer WHERE status=TRUE
 )
GROUP BY id

Also, as Antti Haapala said, don't use string formatting for SQL parameters, because it is insecure and if any parameter contains appropriate quotes, postgres will interpret them as commands instead of data.

answered Mar 12, 2016 at 8:44

hruske

2,24319 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Matthias Over a year ago

That's the way to do it. You have a database, so you should use its power.

Collectives™ on Stack Overflow

Postgresql error statement is too large

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related