Postgres gets out of memory errors despite having plenty of free memory

Question

I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres or some of the clients that are connected to it. What's weird is that when this happens, free still reports over 500MB of free memory.

select version();:

PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

uname -a:

Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Postgresql.conf (everything else is commented out/default):

max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0

I got these values from pgtune (I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).

/etc/sysctl.conf:

kernel.shmmax=1050451968
kernel.shmall=256458

vm.overcommit_ratio=100
vm.overcommit_memory=2

I first changed overcommit_memory to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.

I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:

2015-04-07 05:32:39 UTC ERROR:  out of memory
2015-04-07 05:32:39 UTC DETAIL:  Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT:  automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]

---

2015-04-07 05:33:58 UTC ERROR:  out of memory
2015-04-07 05:33:58 UTC DETAIL:  Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT:  SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG:  could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG:  could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG:  could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]

---

2015-04-07 05:33:59 UTC ERROR:  out of memory
2015-04-07 05:33:59 UTC DETAIL:  Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT:  SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used

The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run very often, so I'm not sure if the issues are because of this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00

This is what ulimit -a for the postgres user looks like:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15956
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15956
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I'll try and get the exact numbers from free next time there's a crash, in the meantime this is a braindump of all the info I have.

Any ideas on where to go from here?

AFAIK the kernel has to promise memory to processes if it's allocated, even if they're never going to use it. This includes the copy-on-write memory regions fork()ed from the postmaster. So even though the memory is free in the sense of not currently being mapped into a process's address space it is committed. I think. I might be totally hand-waving here. It's also possible that some of the kernel's weirdness around accounting for shared memory is at issue. — Craig Ringer
– Craig Ringer, Commented Apr 7, 2015 at 7:17
maintenance_work_mem = 128MB seems quite big given your memory constraints. The fact that the problem occurs when an automatic analyze kicks in, also indicates that this parameter is too large. — user330315
– user330315, Commented Apr 7, 2015 at 7:28
@a_horse_with_no_name I've tried setting it to 16MB. Will report results. Anything else to look out for in the meantime? — Alex Ghiculescu
– Alex Ghiculescu, Commented Apr 7, 2015 at 7:43
2 gigs is not a lot of memory with no swap. Without swap, there's nothing the OS can do if it runs out of memory even for a moment. And it's harder for the kernel to defragment memory. You might have plenty of free memory, but it might all be fragmented. Consider adding some swap, at least 4 gigs. — Schwern
– Schwern, Commented Apr 7, 2015 at 8:55
[what the others have said] + lower max_connections , except if you really need them. Lowering shared_buffers ( :- trusting on OS-cache) is also an option (shared memory is locked in core and unswappable in Linux, IIRC) And yes: add some swap. — joop
– joop, Commented Apr 7, 2015 at 11:45

user183240 · Accepted Answer · 2015-09-24 07:18:20Z

3

I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.

I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.

I created it in the middle of the import. psql made it a lot further before crashing. It made it through 5 additional tables.

I think there must be a bug allocating memory in psql.

answered Sep 24, 2015 at 7:18

user183240

Sign up to request clarification or add additional context in comments.

Comments

mnencia · Accepted Answer · 2015-10-29 23:01:45Z

1

It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?

Output of free command at the time of crash would be useful as well as the content of /proc/meminfo

Beware that setting overcommit_memory to 2 is not so effective if you see the overcommit_ratio to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.

You should probably set overcommit_ratio to 50.

answered Oct 29, 2015 at 23:01

mnencia

3,3881 gold badge26 silver badges35 bronze badges

1 Comment

Alex Ghiculescu Over a year ago

I think that's just a coincidence in how I wrote the question. It was more than 500mb free (as in, "heaps of memory!") rather than a hard value. That said, I think that's a good point on the overcommit_ratio - will tinker with that.

Collectives™ on Stack Overflow

Postgres gets out of memory errors despite having plenty of free memory

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related