了解postgres缓存

2018-06-28 07:17:07

我知道postgres使用LRU /时钟扫描算法从缓存中驱逐数据，但很难理解它如何进入shared_buffers。

请注意，我的目的不是让这个天真的查询更快，索引总是最好的选择。但我想了解如何在没有索引的情况下缓存工作。

让我们从下面的例子中查询执行计划（我有意不包括/创建索引）

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=3874.445..3874.445 rows=1 loops=1)
   Buffers: shared read=35715
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=6.024..3526.606 rows=1000000 loops=1)
         Buffers: shared read=35715
 Planning time: 0.114 ms
 Execution time: 3874.509 ms

我们可以看到所有的数据都是从磁盘中获取的，即共享读取= 35715。

现在，如果我们再次执行相同的查询。

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=426.385..426.385 rows=1 loops=1)
   Buffers: shared hit=32 read=35683
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.036..285.363 rows=1000000 loops=1)
         Buffers: shared hit=32 read=35683
 Planning time: 0.048 ms
 Execution time: 426.431 ms

只有32页/块进入内存。当重复这个时，共享命中不断增加32。

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=416.829..416.829 rows=1 loops=1)
   Buffers: shared hit=64 read=35651
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.034..273.417 rows=1000000 loops=1)
         Buffers: shared hit=64 read=35651
 Planning time: 0.050 ms
 Execution time: 416.874 ms

我的shared_buffers = 1GB，表大小为279MB。所以整个表都可以缓存在内存中，但情况并非如此，缓存的工作方式有点不同。可以解释一下如何计划并将数据从磁盘移动到shared_buffers。

有没有一种机制可以控制每个查询可以将多少页面移动到shared_buffers中。

有一种机制可以防止整个缓冲区高速缓存被连续扫描吹走。它在src/backend/storage/buffer/README ：

当运行一次需要访问大量页面的查询（如VACUUM或大型顺序扫描）时，会使用不同的策略。一个只能被这种扫描触及的页面不会很快再被需要，所以不再需要运行正常的时钟扫描算法并且吹出整个缓冲区缓存，而是使用正常的时钟扫描算法分配一小圈缓冲区，并且这些缓冲区被重新用于整个扫描。这也意味着由这样的语句引起的大部分写入流量将由后端本身完成，而不是推到其他进程上。

对于顺序扫描，使用256KB环。 ...

请注意，32✕8kB = 256kB，这就是你所看到的。

链接地址: http://www.djcxy.com/p/79071.html

上一篇: Understanding postgres caching

下一篇: Heroku Postgres : Out Of Memory Error