Improving query performance in PostgreSQL with nested not in

2018-06-30 20:36:16

I'm attempting to port an application from MySQL 5.6 to PostgreSQL 9.2 The original application uses a view that I've managed to get to at least run but the query time is horrible.

I want to know the best approach in PostgreSQL to optimize "not in" queries.

My first thought was to create a temp table, but as this is a view, I don't think it's an option.

create VIEW ready_ports AS 
    SELECT ports.id AS id, 
           ports.run AS run,
           ports.name AS name, 
           ports.pkgname AS pkgname,
           ports.version AS version,
           ports.description AS description,
           ports.license AS license,
           ports.www AS www, 
           ports.status AS status, 
           ports.updated AS updated,
          (SELECT count(0) AS COUNT 
           FROM depends 
           WHERE depends.dependency = ports.id) AS priority 
    FROM ports 
    WHERE (ports.status = 'untested' and 
          (not(ports.id in 
                 (SELECT locks.port AS port 
                  FROM locks 
                  WHERE locks.port = ports.id)
               )
          ) and 
          (
             (not(ports.id in (SELECT depends.port AS port 
                               FROM depends 
                               WHERE depends.port = ports.id))) or 
             (not(ports.id in 
                  (SELECT depends.port AS port 
                   FROM depends
                   WHERE ((not(depends.dependency in 
                     (SELECT ports.id AS dep_id 
                      FROM ports   
                      WHERE (ports.id = depends.dependency 
                             and (ports.status = 'pass' 
                                  or ports.status = 'warn')
                             )
                     ))) or 
    depends.dependency in 
    (SELECT locks.port AS port 
     FROM locks 
     WHERE locks.port = ports.id)))))))
ORDER BY priority desc

                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Sort  (cost=367498265655.68..367498265763.29 rows=43047 width=136)
   Sort Key: ((SubPlan 1))
   ->  Index Scan using ports_1_idx on ports  (cost=0.00..367498259398.93 rows=43047 width=136)
         Index Cond: ((status)::text = 'untested'::text)
         Filter: ((NOT (SubPlan 2)) AND ((NOT (SubPlan 3)) OR (NOT (SubPlan 6))))
         SubPlan 1
           ->  Aggregate  (cost=9.62..9.63 rows=1 width=0)
                 ->  Index Only Scan using depends_dependency_idx on depends  (cost=0.00..9.47 rows=60 width=0)
                       Index Cond: (dependency = public.ports.id)
         SubPlan 2
           ->  Index Only Scan using locks_port_key on locks  (cost=0.00..8.27 rows=1 width=4)
                 Index Cond: (port = public.ports.id)
         SubPlan 3
           ->  Index Only Scan using depends_pkey on depends  (cost=0.00..8.72 rows=14 width=4)
                 Index Cond: (port = public.ports.id)
         SubPlan 6
           ->  Seq Scan on depends  (cost=8.27..6399946.81 rows=1150079 width=4)
                 Filter: ((NOT (SubPlan 4)) OR (hashed SubPlan 5))
                 SubPlan 4
                   ->  Index Scan using ports_pkey on ports  (cost=0.00..8.31 rows=1 width=4)
                         Index Cond: (id = public.depends.dependency)
                         Filter: (((status)::text = 'pass'::text) OR ((status)::text = 'warn'::text))
                 SubPlan 5
                   ->  Index Only Scan using locks_port_key on locks  (cost=0.00..8.27 rows=1 width=4)
                         Index Cond: (port = public.ports.id)

您可以尝试使用NOT EXISTS和反连接版本，因为NOT IN不能使用很多索引（因为NULL处理问题）：

SELECT *
FROM table1
WHERE table1.id NOT IN (SELECT id FROM table2)

-- vs NOT EXISTS

SELECT *
FROM table1
WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.id = table2.id)

-- vs anti-join

SELECT *
FROM table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL

我结束了使用连接和不存在查询的组合来获得最终工作查询。

create VIEW ready_ports AS 
SELECT ports.id AS id, 
       ports.run AS run,
       ports.name AS name, 
       ports.pkgname AS pkgname,
       ports.version AS version,
       ports.description AS description,
       ports.license AS license,
       ports.www AS www, 
       ports.status AS status, 
       ports.updated AS updated,
      (SELECT count(0) AS COUNT 
       FROM depends 
       WHERE depends.dependency = ports.id) AS priority 
FROM ports 
LEFT JOIN locks on locks.port = ports.id
LEFT JOIN depends on depends.port = ports.id
WHERE ports.status = 'untested' and locks.id is null and 
      (depends.port is null or 
         not exists
              (SELECT depends.port AS port 
               FROM depends WHERE ports.id = depends.port and not exists              
                 (SELECT ports.id as dep_id
                  FROM ports   
                  WHERE ports.id = depends.dependency and 
                  (ports.status = 'pass' or ports.status = 'warn'))
                  or 
depends.dependency = locks.port))
ORDER BY priority desc, ports.name asc

You need to rewrite the query using joins:

select ...
from ports
left join locks ...
left join depends ...
where criteria

That way, you'll be working on one big set, which is the result of three sets, instead of working on a half dozen sets.

Moving the count out of your view will be a plus as well. Use a seperate query or join your view to get that part. (An aggregate in a view is rarely a good idea, except perhaps in reports.)

链接地址: http://www.djcxy.com/p/86074.html

上一篇: 长时间运行的查询PostgreSQL

下一篇: 嵌套不在中提高PostgreSQL中的查询性能