What kind of index for orderby/where query in SQL?

2018-06-22 02:45:47

I'd like to run a query on an SQLite database which looks like

SELECT a,b,c,d FROM data WHERE a IN (1,2,3) ORDER BY b,c

What kind/order of index should I use to enable SQLite (or maybe later MySQL) to do this fast? How can I easily check if the query is being enhanced by an index (ie how to interpret EXPLAIN)? Will SQLite be faster if I include d in the index?

EDIT: Here are the characteristics of the table:

10.000.000 rows

60 distinct a

6.000.000 distinct b

2.000 distinct c

no constraints

the table is my personal analytics data; it is written only once and then only read

PS: Is there a reference where I can learn when SQLite/MySQL can use indices?

If, and only if, IN (1,2,3) is a constant list (always the same values) you can use a partial index like so:

CREATE INDEX so ON data (b,c) WHERE a IN (1,2,3)

Then running your query gives this plan ( explain query plan select... ):

0|0|0|SCAN TABLE data USING INDEX so
0|0|0|EXECUTE LIST SUBQUERY 1

Note: no ORDER BY operation.

As a counter test, let's drop the index and replace it like so:

CREATE INDEX so ON data (a,b,c);

The new execution plan is:

0|0|0|SEARCH TABLE data USING INDEX so (a=?)
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR ORDER BY

You see the sort operation now?

I haven't generated any meaningful test data (just an empty table) to verify the execution speed improvement. But I guess you should see it right away after creating the index.

Also note that partial indexes are only supported since SQLite 3.8.0 (released 2013-08-26)

A small thing to consider is: what amount of rows is found if you filter on a in (1, 2, 3) ? If this is a large part of the table, which can already be as much as 15% or so, the use of an index might even reduce performance.

Compare this with a book index. Assume the index is complete, which means that all words are indexed. If you are looking for occurrences of "and", and you use this index, you won't get ready hopping from the index to your text and back. Simply reading the book from cover to cover, scanning for "and" will definitely be the faster option.

It isn't clear where the break even point lies, because it depends on a lot of factors. But it lies lower than most people think. (I already mentioned 15%, which, from my experience, is a good rule of a thumb)

Using an index can still be an option, if the sort can be omitted. The tree index would have columns (b, c, a) in that case. (A hash index wouldn't help there). Depending on data types and update frequency, you could even consider using (b, c, a, d) as an index. The DBMS would only have to do an index scan, not a table scan. (If d is huge, it won't help too much and spoil a lot of space; if d is updated very frequently, it might be a bad idea too, because it doubles the workload of the update).

Physical database design is often a matter of finding the right compromise.

OK, a lot of my writing isn't applicable any more after your edit. Still I think the answer might give you some things to think about.

The following index helps you get the records quickly - provided of course the dbms considers the use of an index to be faster than a full table scan. For example if it thinks a in (1,2,3) will get 90% of the records in the table, it should shy away from using an index and simply scan the full table instead.

CREATE INDEX idx ON data(a);

The following index helps you get the records quickly and even sort them quickly. Again, if the dbms considers it wrong to use an index at all, this index won't be used. But it has become more likely that the index be used, because the dbms doesn't only get the information which records to access, but it would get them already sorted.

CREATE INDEX idx ON data(a,b,c);

The following index helps you get the records quickly and sort them quickly and even not have to access the table at all. Here all data is present in the index, so there is no reason for the dbms not to use the index. It is all there: the criteria to get the wanted data, it's sorted, and even the data itself is already there.

CREATE INDEX idx ON data(a,b,c,d);

链接地址: http://www.djcxy.com/p/62068.html

上一篇: 在单个Access查询中直接和加班

下一篇: orderby / where查询SQL中的哪种索引？