Understanding postgres caching

2018-06-28 07:17:08

I know that postgres uses an LRU/clock sweep algorithm to evict data from cache, but having a hard time understanding how it goes into shared_buffers.

Please note that my intention is not to make this naive query faster, an index is always the best option.But I want to understand how the cache works in the absence of indexes.

Lets take the below query execution plan from example (I have purposely not included/created an index)

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=3874.445..3874.445 rows=1 loops=1)
   Buffers: shared read=35715
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=6.024..3526.606 rows=1000000 loops=1)
         Buffers: shared read=35715
 Planning time: 0.114 ms
 Execution time: 3874.509 ms

We can see that all of the data was fetched from disk, ie shared read = 35715.

Now if we executed the same query again.

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=426.385..426.385 rows=1 loops=1)
   Buffers: shared hit=32 read=35683
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.036..285.363 rows=1000000 loops=1)
         Buffers: shared hit=32 read=35683
 Planning time: 0.048 ms
 Execution time: 426.431 ms

Only 32 pages/blocks came into memory.When repeating this, the shared hit keeps increasing by 32.

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=416.829..416.829 rows=1 loops=1)
   Buffers: shared hit=64 read=35651
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.034..273.417 rows=1000000 loops=1)
         Buffers: shared hit=64 read=35651
 Planning time: 0.050 ms
 Execution time: 416.874 ms

My shared_buffers = 1GB and the table size is 279MB. So the whole of the table can be cached in memory, but this is not the case and the cache works a bit differently.Can someone explain how it plans and moves the data from disk to shared_buffers.

Is there a mechanism, which controls how much pages can move into the shared_buffers with each query.

There is a mechanism that prevents the entire buffer cache from being blown away by a sequential scan. It is explained in src/backend/storage/buffer/README :

When running a query that needs to access a large number of pages just once, such as VACUUM or a large sequential scan, a different strategy is used. A page that has been touched only by such a scan is unlikely to be needed again soon, so instead of running the normal clock sweep algorithm and blowing out the entire buffer cache, a small ring of buffers is allocated using the normal clock sweep algorithm and those buffers are reused for the whole scan. This also implies that much of the write traffic caused by such a statement will be done by the backend itself and not pushed off onto other processes.

For sequential scans, a 256KB ring is used. ...

Note that 32 ✕ 8kB = 256kB, so that is what you are seeing.

链接地址: http://www.djcxy.com/p/79072.html

上一篇: postgres的性能非常缓慢

下一篇: 了解postgres缓存