Heroku Postgres DB slower after upgrade

2018-06-28 07:10:52

2 days ago, I upgraded our Heroku Postgres server from Kappa to Ronin. Our DB was up to several GB and I figured the extra ram would help with the cache. I used the standard fast swapping technique (create follower, allow transfer, promote follower). I know that the cache can take time to warm up, but it's been several days and it's been SLOWING down.

Our smaller DB was running around 5ms response times. The new DB jumped to about 10ms after the transfer (cold cache). It has since fluctuated between 10ms and 20ms.

The new DB is running the exact same version (9.2.4).

I have noticed there is more logging occurring (checkpoints).

The db cache hit/miss from the old DB was ~0.91, hence the update. The new DB is already up to a similar hit/miss so I would expect that the warmness of the cache is no longer the issue.

Is there some config which could be different? I know that every app is different, but shouldn't the cache have warmed by now? Is there any undocumented differences between Kappa & Ronin?

Thanks

I've seen this before with a client who called me for some emergency help.

After doing some poking around with heroku bash we eventually concluded that the new instance was on particularly busy underlying server. We did a failover via follower promotion to another machine, at which point performance greatly improved - though the failover its self was challenging due to the problems with the master.

As far as I know Heroku's instances are Amazon EC2 nodes (Xen VMs) that run an LXC container to isolate each Heroku user's database clusters. LXC offers rather less isolation than a full VM does; instances can contend for RAM, disk I/O, CPU, etc, depending on the exact policy configured with OpenCZ, any control group policies, etc.

If you're on an instance where the other users aren't doing much and if the container permits your DB to use resources that aren't currently required by other users, you could easily see steadily higher than guaranteed performance.

I suspect that people on larger heroku plans are more likely to actually be using the resources of the system you're sharing a container with.

If you do a promotion failover to a bigger instance where all the users are there because they really need the resources offered by the bigger machine you could actually get less resources overall, because everyone's actually using their shares.

It's frustrating that Heroku offer so little visibility into the systems that run their DBs. It's hard to tell how/if they load balance between container hosts, what the underlying load on the system is, etc.

In a comment, @Forrest pointed out that Heroku have a useful page on their server details, showing that only the lower tiers are multi-tenant, but higher tiers are not. This would easily explain the performance loss observed here, and would fit in with my comments above that the lower plan was allowing Forrest to borrow unused resources from other users.

We have just migrated and noticed the same slow. However I didnt get a official answer on support. My users are complaining and I am so disappointed with Heroku at this point.

链接地址: http://www.djcxy.com/p/79060.html

上一篇: Postgres数据库大小命令

下一篇: 升级后Heroku Postgres DB较慢