5

2 days ago, I upgraded our Heroku Postgres server from Kappa to Ronin. Our DB was up to several GB and I figured the extra ram would help with the cache. I used the standard fast swapping technique (create follower, allow transfer, promote follower). I know that the cache can take time to warm up, but it's been several days and it's been SLOWING down.

Our smaller DB was running around 5ms response times. The new DB jumped to about 10ms after the transfer (cold cache). It has since fluctuated between 10ms and 20ms.

  • The new DB is running the exact same version (9.2.4).
  • I have noticed there is more logging occurring (checkpoints).
  • The db cache hit/miss from the old DB was ~0.91, hence the update. The new DB is already up to a similar hit/miss so I would expect that the warmness of the cache is no longer the issue.

Is there some config which could be different? I know that every app is different, but shouldn't the cache have warmed by now? Is there any undocumented differences between Kappa & Ronin?

Thanks

2
  • Is it possible an EXTENSION or similar config wouldn't transfer? Commented Jul 15, 2013 at 22:30
  • @CraigRinger thanks for the tip. If heroku hasn't responded with something useful by tomorrow, I'll try this. Why don't you submit this as an answer, so I can accept it if successful. Commented Jul 16, 2013 at 3:13

1 Answer 1

5

I've seen this before with a client who called me for some emergency help.

After doing some poking around with heroku bash we eventually concluded that the new instance was on particularly busy underlying server. We did a failover via follower promotion to another machine, at which point performance greatly improved - though the failover its self was challenging due to the problems with the master.

As far as I know Heroku's instances are Amazon EC2 nodes (Xen VMs) that run an LXC container to isolate each Heroku user's database clusters. LXC offers rather less isolation than a full VM does; instances can contend for RAM, disk I/O, CPU, etc, depending on the exact policy configured with OpenCZ, any control group policies, etc.

If you're on an instance where the other users aren't doing much and if the container permits your DB to use resources that aren't currently required by other users, you could easily see steadily higher than guaranteed performance.

I suspect that people on larger heroku plans are more likely to actually be using the resources of the system you're sharing a container with.

If you do a promotion failover to a bigger instance where all the users are there because they really need the resources offered by the bigger machine you could actually get less resources overall, because everyone's actually using their shares.

It's frustrating that Heroku offer so little visibility into the systems that run their DBs. It's hard to tell how/if they load balance between container hosts, what the underlying load on the system is, etc.

In a comment, @Forrest pointed out that Heroku have a useful page on their server details, showing that only the lower tiers are multi-tenant, but higher tiers are not. This would easily explain the performance loss observed here, and would fit in with my comments above that the lower plan was allowing Forrest to borrow unused resources from other users.

Sign up to request clarification or add additional context in comments.

4 Comments

This was the problem. Seems like our kappa server-mates weren't using it much, and we were seeing lot's of "steal" on the Ronin server. We won't get that kind of cpu performance until the Ika plan. devcenter.heroku.com/articles/…
It should also be noted that even the new Ronin server is performing slower than our original Kappa. If anybody else is thinking of upgrading from kappa, make sure your performance is truly deteriorating, as you may be losing a CPU.
@Forrest that's an exceedingly useful article you linked to. Thankyou. I was not aware that only the lower-end Heroku instances used multi-tenant hosting with openvz; that explains a lot of the anomalous behaviour I have seen in the past.
@Forrest I've corrected my answer to note that Heroku's multi-tenant hosts use LXC not OpenVZ, though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.