I need to make a huge number of SQL queries that update or insert rows using Psycopg2. There are no other queries being run intermediately. Example with a table A having columns name and value:
% Basically models a list of strings and how many times they "appear"
% 'foo' is some random value each time, sometimes repeating
insert into A select ('foo', 0)
where not exists(select 1 from A where name = 'foo' limit 1);
update A set value = value + 1 where name = 'foo';
% ... and many more just like this
This is just an example, one type of query I'm running. I'm doing other things too. I'm not looking for a solution involving reworking my SQL queries.
It's really slow, with Postgres (which is running on another server) bottlenecking it. I've tried various things to make it faster.
- It was unbearably slow if I committed after every query.
- It was a bit faster if I didn't
connection.commit()until the end. This seems to be what the Psycopg2 documentation suggests I do. Postgres was still bottlenecking horribly on disk access. - It was much faster if I used
cursor.mogrify()instead ofcursor.execute(), stored all the queries in a big list, joined them at the end into one massive query (literally";".join(qs)), and ran it. Postgres was using 100% CPU, a good sign because that means ~ no disk bottleneck. But that was sometimes causing thepostgresprocess to use up all my RAM and start page faulting then get bottlenecked on disk access forever, a disaster. I've set all the memory limits for Postgres to reasonable values using pgtune, but I'm guessing Postgres is allocating a bunch of work buffers with no limit and going over. - I've tried the above solution except committing every 100,000 or so queries to avoid overloading the server, but that's not going to be a perfect solution. It's what I've got for now. It seems like a ridiculous hack and is still slower than I'd like.
Is there some other way I should try involving Psycopg2?