Oracle ROWNUM performance

To query top-n rows in Oracle, it is general to use ROWNUM. So the following query seems ok (gets most recent 5 payments):

select a.paydate, a.amount
from (
  select t.paydate, t.amount
  from payments t
  where t.some_id = id
  order by t.paydate desc
) a
where rownum <= 5;

But for very big tables, it is inefficient - for me it run for ~10 minutes. So I tried other queries, and I ended up with this one which runs for less than a second:

select *
from (
  select  a.*, rownum
  from (select t.paydate, t.amount
        from payments t
        where t.some_id = id
        order by t.paydate desc) a
)
where rownum <= 5;

To find out what is happening, I looked execution plans for each query. For first query:

SELECT STATEMENT, GOAL = ALL_ROWS   7   5   175
COUNT STOPKEY           
VIEW    7   5   175
TABLE ACCESS BY INDEX ROWID 7   316576866   6331537320
INDEX FULL SCAN DESCENDING  4   6   

And for second:

SELECT STATEMENT, GOAL = ALL_ROWS   86  5   175
COUNT STOPKEY           
VIEW    86  81  2835
COUNT           
VIEW    86  81  1782
SORT ORDER BY   86  81  1620
TABLE ACCESS BY INDEX ROWID 85  81  1620
INDEX RANGE SCAN    4   81  

Obviously, it is INDEX FULL SCAN DESCENDING that makes first query inefficient for big tables. But I can not really differentiate the logic of two queries by looking at them. Could anyone explain me the logical differences between two queries in human language?

Thanks in advance!


First of all, as mentioned in Alex's comment, I'm not sure that your second version is 100% guaranteed to give you the right rows -- since the "middle" block of the query does not have an explicit order by , Oracle is under no obligation to pass the rows up to the outer query block in any specific order. However, there doesn't seem to be any particular reason that it would change the order that the rows are passed up from the innermost block, so in practice it will probably work.

And this is why Oracle chooses a different plan for the second query -- it is logically not able to apply the STOPKEY operation to the innermost query block.

I think in the first case, the optimizer is assuming that id values are well-distributed and, for any given value, there are likely to be some very recent transactions. Since it can see that it only needs to find the 5 most recent matches, it calculates that it appears to be more efficient to scan the rows in descending order of paydate using an index, lookup the corresponding id and other data from the table, and stop when it's found the first 5 matches. I suspect that you would see very different performance for this query depending on the specific id value that you use -- if the id has a lot of recent activity, the rows should be found very quickly, but if it does not, the index scan may have to do a lot more work.

In the second case, I believe it's not able to apply the STOPKEY optimization to the innermost block due to the extra layer of nesting. In that case, the index full scan would become much less attractive, since it would always need to scan the entire index. Therefore it chooses to do an index lookup on id (I'm assuming) followed by an actual sort on the date. If the given id value matches a small subset of rows, this is likely to be more efficient -- but if you give an id that has lots of rows spread throughout the entire table, I would expect it to become slower, since it will have to access and sort many rows.

So, I would guess that your tests have used id value(s) that have relatively few rows which are not very recent. If this would be a typical use case, then the second query is probably better for you (again, with the caveat that I'm not sure it is technically guaranteed to produce the correct result set). But if typical values would be more likely to have many matching rows and/or more likely to have 5 very recent rows, then the first query and plan might be better.

链接地址: http://www.djcxy.com/p/10432.html

上一篇: 使用HTML5音频标签发送自定义HTTP请求标头

下一篇: Oracle ROWNUM性能