Creating an appropriate index for a frequently used query in SQL Server

2018-05-29 08:20:12

In my application I have two queries which will be quite frequently used. The Where clauses of these queries are the following:

WHERE FieldA = @P1 AND (FieldB = @P2 OR FieldC = @P2)

and

WHERE FieldA = @P1 AND FieldB = @P2

P1 and P2 are parameters entered in the UI or coming from external datasources.

FieldA is an int and highly non-unique, means: only two, three, four different values in a table with say 20000 rows

FieldB is a varchar(20) and is "almost" unique, there will be only very few rows where FieldB might have the same value

FieldC is a varchar(15) and also highly distinct, but not as much as FieldB

FieldA and FieldB together are unique (but do not form my primary key, which is a simple auto-incrementing identity column with a clustered index)

I'm wondering now what's the best way to define an index to speed up specifically these two queries. Shall I define one index with...

FieldB (or better FieldC here?)
FieldC (or better FieldB here?)
FieldA

... or better two indices:

FieldB
FieldA

and

FieldC
FieldA

Or are there even other and better options? What's the best way and why?

Thank you for suggestions in advance!

Edit:

Just as an info to other readers: Here was another answer which has been deleted now. Actually the answer seemed very useful to me. The recommendation was to create two indices (according to my second option above) and to reformulate the first query by using a UNION of two select statements (one with WHERE FieldA = @P1 AND FieldB = @P2 and one with WHERE FieldA = @P1 AND FieldC = @P2 ) instead of OR to benefit from both indices (which wouldn't be the case with the OR-operator).

Edit2:

The statement that with OR the indexes are not used and that a UNION is preferable seems to be wrong - at least according to my own tests (see my own answer below).

Extending Remus' (edit: now deleted) answer...

if @p2 is varchar(15) then you can't compare against FieldB reliably, it's varchar(20)

if @p2 is varchar(20) then FieldC will be converted to varchar(20) and not use an index (or at best scan it)

if @p1 only has 2, 3, 4 values then why not tinyint and reduce table/index size?

I wouldn't bother with indexes until you resolve this datatype precedence issue: this is on top of the OR clause issue.

Finally, a column is unique or non-unique: there is no in between. Statistics help here with selectivity, but it's irrelevant.

I would reverse the indexes from Remus' answer to be FieldB, FieldA (and unique) and FieldC, FieldA because of FieldA's selectivity

Edit, after comments: you can't compare the use of @p2 against the use of constant strings.

I add my own answer after some tests with a larger database (in SQL Server 2008):

First, I have decided for the second option, means, I have created two indexes:

CREATE UNIQUE NONCLUSTERED INDEX [IX_B] ON [dbo].[MyTable] 
(
    [FieldB] ASC,
    [FieldA] ASC
)
CREATE NONCLUSTERED INDEX [IX_C] ON [dbo].[MyTable] 
(
    [FieldC] ASC,
    [FieldA] ASC
)

I've tested then two queries:

declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';

select * from MyTable
where FieldA=@p1 and (FieldB=@p2 or FieldC=@p2);

Performing this query I get the following query plan ( ID is the primary key of the table, PK_MyTable the clustered index on the primary key):

|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
   |--Stream Aggregate(GROUP BY:([MyDb].[dbo].[MyTable].[ID]) DEFINE:([MyDb].[dbo].[MyTable].[FieldA]=ANY([MyDb].[dbo].[MyTable].[FieldA])))
   |  |--Merge Join(Concatenation)
   |     |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |     |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)

So it seems both indexes are used ("Index Seek").

Elapsed time for the query: 00:00:00.2220127

The second query I tested was using a JOIN to avoid the OR operator (see the "Edit" in my question):

declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';

select * from MyTable where FieldA=@p1 and FieldB=@p2
union
select * from MyTable where FieldA=@p1 and FieldC=@p2;

This query has the following query plan:

|--Merge Join(Union)
   |--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
   |  |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |  |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
   |--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
      |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
      |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)

Again both indexes are used ("Index seek").

Elapsed time for the query: 00:00:00.3710212

Note: For both queries it doesn't matter which length I declare @p2 with: Using varchar(8) or varchar(20) or varchar(30) gives the same results and query plans.

Following these results I will stay with using the OR operator instead of the UNION, since both queries use the indexes but the first one is faster.

链接地址: http://www.djcxy.com/p/1168.html

上一篇: 从头开始扩展Java Web应用程序

下一篇: 在SQL Server中为经常使用的查询创建适当的索引