PostgreSQL tuning best practices for data warehousing
I have found plenty of online and print guides on how to tune and optimize performance for Postgres for OLTP applications, but I haven't found anything of the sort specific to Data Warehousing applications. Since there are so many differences in the types of workload, I'm sure there has to be some differences in how the databases are managed and tuned.
Some of my own:
I have found from the DDL side that I use indexes a lot more liberally, since I usually only worry about inserts once a day and can do batch inserts with index rebuilds.
I will typically use integer surrogate keys to data that typically has more than one natural key for faster joins
I will usually define and maintain a very comprehensive date table that has prebuilt date manipulations (fiscal date as opposed to calendar date, fiscal year-month, starting day of the week, etc) and use it liberally as opposed to using functions in select statements and where statements. This usually helps during CPU-bound aggregate queries.
I was hoping that I would find some information on memory management and other database settings, but I would be happy to hear any useful best practices specific to Postgres-based Data Warehousing.
My experience (admittedly on a pretty small scale when it comes to data warehouses):
From a memory management perspective one of your largest differences is that you can often hope to keep the working OLTP set in memory while this is not the case with OLAP environments. Additionally very often your joined sets are bigger. This means higher work_mem settings can be very helpful and to the extent tables are denormalized this means one can push work_mem a bit higher than it might be otherwise. I am not sure my advice on shared_buffers would change (I prefer to start low and increase, testing performance at each step) but work_mem certainly would need to increase if you are doing reporting on sets of any size.
链接地址: http://www.djcxy.com/p/11182.html