Improving query performance in PostgreSQL with nested not in
I'm attempting to port an application from MySQL 5.6 to PostgreSQL 9.2 The original application uses a view that I've managed to get to at least run but the query time is horrible.
I want to know the best approach in PostgreSQL to optimize "not in" queries.
My first thought was to create a temp table, but as this is a view, I don't think it's an option.
create VIEW ready_ports AS
SELECT ports.id AS id,
ports.run AS run,
ports.name AS name,
ports.pkgname AS pkgname,
ports.version AS version,
ports.description AS description,
ports.license AS license,
ports.www AS www,
ports.status AS status,
ports.updated AS updated,
(SELECT count(0) AS COUNT
FROM depends
WHERE depends.dependency = ports.id) AS priority
FROM ports
WHERE (ports.status = 'untested' and
(not(ports.id in
(SELECT locks.port AS port
FROM locks
WHERE locks.port = ports.id)
)
) and
(
(not(ports.id in (SELECT depends.port AS port
FROM depends
WHERE depends.port = ports.id))) or
(not(ports.id in
(SELECT depends.port AS port
FROM depends
WHERE ((not(depends.dependency in
(SELECT ports.id AS dep_id
FROM ports
WHERE (ports.id = depends.dependency
and (ports.status = 'pass'
or ports.status = 'warn')
)
))) or
depends.dependency in
(SELECT locks.port AS port
FROM locks
WHERE locks.port = ports.id)))))))
ORDER BY priority desc
QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Sort (cost=367498265655.68..367498265763.29 rows=43047 width=136) Sort Key: ((SubPlan 1)) -> Index Scan using ports_1_idx on ports (cost=0.00..367498259398.93 rows=43047 width=136) Index Cond: ((status)::text = 'untested'::text) Filter: ((NOT (SubPlan 2)) AND ((NOT (SubPlan 3)) OR (NOT (SubPlan 6)))) SubPlan 1 -> Aggregate (cost=9.62..9.63 rows=1 width=0) -> Index Only Scan using depends_dependency_idx on depends (cost=0.00..9.47 rows=60 width=0) Index Cond: (dependency = public.ports.id) SubPlan 2 -> Index Only Scan using locks_port_key on locks (cost=0.00..8.27 rows=1 width=4) Index Cond: (port = public.ports.id) SubPlan 3 -> Index Only Scan using depends_pkey on depends (cost=0.00..8.72 rows=14 width=4) Index Cond: (port = public.ports.id) SubPlan 6 -> Seq Scan on depends (cost=8.27..6399946.81 rows=1150079 width=4) Filter: ((NOT (SubPlan 4)) OR (hashed SubPlan 5)) SubPlan 4 -> Index Scan using ports_pkey on ports (cost=0.00..8.31 rows=1 width=4) Index Cond: (id = public.depends.dependency) Filter: (((status)::text = 'pass'::text) OR ((status)::text = 'warn'::text)) SubPlan 5 -> Index Only Scan using locks_port_key on locks (cost=0.00..8.27 rows=1 width=4) Index Cond: (port = public.ports.id)
您可以尝试使用NOT EXISTS
和反连接版本,因为NOT IN
不能使用很多索引(因为NULL处理问题):
SELECT *
FROM table1
WHERE table1.id NOT IN (SELECT id FROM table2)
-- vs NOT EXISTS
SELECT *
FROM table1
WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.id = table2.id)
-- vs anti-join
SELECT *
FROM table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL
我结束了使用连接和不存在查询的组合来获得最终工作查询。
create VIEW ready_ports AS
SELECT ports.id AS id,
ports.run AS run,
ports.name AS name,
ports.pkgname AS pkgname,
ports.version AS version,
ports.description AS description,
ports.license AS license,
ports.www AS www,
ports.status AS status,
ports.updated AS updated,
(SELECT count(0) AS COUNT
FROM depends
WHERE depends.dependency = ports.id) AS priority
FROM ports
LEFT JOIN locks on locks.port = ports.id
LEFT JOIN depends on depends.port = ports.id
WHERE ports.status = 'untested' and locks.id is null and
(depends.port is null or
not exists
(SELECT depends.port AS port
FROM depends WHERE ports.id = depends.port and not exists
(SELECT ports.id as dep_id
FROM ports
WHERE ports.id = depends.dependency and
(ports.status = 'pass' or ports.status = 'warn'))
or
depends.dependency = locks.port))
ORDER BY priority desc, ports.name asc
You need to rewrite the query using joins:
select ...
from ports
left join locks ...
left join depends ...
where criteria
That way, you'll be working on one big set, which is the result of three sets, instead of working on a half dozen sets.
Moving the count out of your view will be a plus as well. Use a seperate query or join your view to get that part. (An aggregate in a view is rarely a good idea, except perhaps in reports.)
链接地址: http://www.djcxy.com/p/86074.html上一篇: 长时间运行的查询PostgreSQL