|
Home > Archive > microsoft.public.sqlserver.server > June 2002 > Why is SQL Server choosing the wrong index?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Why is SQL Server choosing the wrong index?
|
|
| Eric Moreau 2002-06-24, 9:25 am |
| Hi,
I'm using SQL Server 7, SP3. The database is for tracking production
data. We have a table called Collected_Data that has about 12
columns, the first 3 of which are
Job_ID
Machine_ID
Start_Time
The primary key of the table is Job_ID, Machine_ID, Start_Time. The
clustered index is on Start_Time.
For a particular set of jobs, I'm trying to sum all the quantity done
for each operation in each job. I'm joining to the Collected_Data
table on Job_ID and Machine_ID, so it should pick the primary key to
use for the index. It does choose that index on a smaller database,
however when the table gets large (over 1 million rows vs. 500000
rows) it decides to use the clustered index instead. This causes my
query to take 50 seconds instead of 10 seconds when supplied with the
right INDEX to use.
A stripped-down version of the query follows. Making the LEFT OUTER
JOIN just a JOIN doesn't make any difference in the query plan chosen.
SELECT J.Job_ID, Op.Operation_Type, COALESCE(SUM(JRD.Quantity), 0) AS
'CompQuantity'
FROM Job J JOIN Operation Op ON J.Job_ID = Op.Job_ID
JOIN Resource_Group_Details RGD ON Op.Resource_Group_ID =
RGD.Resource_Group_ID
LEFT OUTER JOIN Job_Run_Details JRD ON (JRD.Job_ID = J.Job_ID AND
JRD.Resource_ID = RGD.Resource_ID)
WHERE J.Due_Date BETWEEN 'Jun 1, 2002' AND 'Jun 10, 2002'
GROUP BY J.Job_ID, Op.Operation_Type
I can use a query hint to get it to choose the right index, however
I'm wondering in how many other places I'm going to need to do this.
I've rebuilt the statistics on the index with FULLSCAN, and even the
statistics on the entire table with FULLSCAN, but it still didn't
help. Any ideas? What's SQL Server's logic for doing this?
Eric Moreau
| |
| Gert-Jan Strik 2002-06-25, 3:25 pm |
| If the query optimizer chooses to use a non-covering non-clustered
index, then it will always need to perform bookmark lookups to retrieve
the actual rows. When choosing the clustered index, this is not an
issue, because the leaf-level of a clustered index is the actual data
level.
Sequential read are much faster than random reads. When using a
clustered index, sequential reading is done. Whole pages can be read.
When using a nonclustered index, each index entry has to be fetched
using random reads.
The optimizer is tuned for minimizing physical reads. Large amounts of
bookmark lookups (random reads) become very expensive very fast. So when
selecting just a few percent (let's say 3%) of all rows (let's say
1,000,000) from a very large table would yield many bookmark lookups
(let's say 30,000). If your row size is limited (let's say 24 bytes),
then a full table scan would require far less sequential reads (let's
say 335 rows per page which would make 2,985 reads).
In your case, the optimizer is assuming (or is misjudging) the buffer
cache hit ratio. When you are running your query, you probably don't see
any physical reads, just logical reads. IMO this part of the optimizer
can be improved. When all necessary pages are in cache, it is more
expensive to process a table scan than it is to process bookmark lookups
(because all logical reads in the cache are equally fast).
One way to greatly increase your query performance would be to change
your clustered index. If the query below were the only query you ever
fired at the database, then a clustered index on (Job_Id, Machine_Id)
would be ideal.
Hope this helps,
Gert-Jan
Eric Moreau wrote:
>
> Hi,
>
> I'm using SQL Server 7, SP3. The database is for tracking production
> data. We have a table called Collected_Data that has about 12
> columns, the first 3 of which are
>
> Job_ID
> Machine_ID
> Start_Time
>
> The primary key of the table is Job_ID, Machine_ID, Start_Time. The
> clustered index is on Start_Time.
>
> For a particular set of jobs, I'm trying to sum all the quantity done
> for each operation in each job. I'm joining to the Collected_Data
> table on Job_ID and Machine_ID, so it should pick the primary key to
> use for the index. It does choose that index on a smaller database,
> however when the table gets large (over 1 million rows vs. 500000
> rows) it decides to use the clustered index instead. This causes my
> query to take 50 seconds instead of 10 seconds when supplied with the
> right INDEX to use.
>
> A stripped-down version of the query follows. Making the LEFT OUTER
> JOIN just a JOIN doesn't make any difference in the query plan chosen.
>
> SELECT J.Job_ID, Op.Operation_Type, COALESCE(SUM(JRD.Quantity), 0) AS
> 'CompQuantity'
> FROM Job J JOIN Operation Op ON J.Job_ID = Op.Job_ID
> JOIN Resource_Group_Details RGD ON Op.Resource_Group_ID =
> RGD.Resource_Group_ID
> LEFT OUTER JOIN Job_Run_Details JRD ON (JRD.Job_ID = J.Job_ID AND
> JRD.Resource_ID = RGD.Resource_ID)
> WHERE J.Due_Date BETWEEN 'Jun 1, 2002' AND 'Jun 10, 2002'
> GROUP BY J.Job_ID, Op.Operation_Type
>
> I can use a query hint to get it to choose the right index, however
> I'm wondering in how many other places I'm going to need to do this.
> I've rebuilt the statistics on the index with FULLSCAN, and even the
> statistics on the entire table with FULLSCAN, but it still didn't
> help. Any ideas? What's SQL Server's logic for doing this?
>
> Eric Moreau
|
|
|
|
|