help with query (Network Steve Forum)

help with query

hi all, I'm having a problem tuning this query. I have tried refactoring it a few different ways and none of them has help. The best I can get was to realign the nonclustered index and adding a query hint of forceseek. But this query gets called 110,202 times a day, with a logical read of 8,694,249,658. This is on a "good day."

I have also tried to compress the index but that helped i/o only a little bit. I think the problem is with the RIGHT join and can't seem to work around it.

SELECT [side].[Id]
	,[base].[OrderId]
	,[base].[Branch]
	,[base].[ProductId]
	,[base].[OrderLineNumber]
	,[base].[OrderedQuantity]
	,[base].[AllocatedQuantity]
	,[base].[InvoicedQuantity]
	,[base].[OrderedWeight]
	,[base].[AllocatedWeight]
	,[base].[UnitPrice]
	,[base].[ExtendedPrice]
	,[base].[PriceMethod]
	,[base].[PriceMargin]
	,[base].[LineStatus]
	,[base].[LineJITStatus]
	,[base].[BasePrice]
	,[base].[AdagePriceNumberKey]
	,[base].[OrderDate]
	,[side].[sync_row_is_tombstone]
	,[side].[local_update_peer_timestamp] AS sync_row_timestamp
	,CASE 
		WHEN (
				[side].[update_scope_local_id] IS NULL
				OR [side].[update_scope_local_id] <> @sync_scope_local_id
				)
			THEN COALESCE([side].[restore_timestamp], [side].[local_update_peer_timestamp])
		ELSE [side].[scope_update_peer_timestamp]
		END AS sync_update_peer_timestamp
	,CASE 
		WHEN (
				[side].[update_scope_local_id] IS NULL
				OR [side].[update_scope_local_id] <> @sync_scope_local_id
				)
			THEN CASE 
					WHEN ([side].[local_update_peer_key] > @sync_scope_restore_count)
						THEN @sync_scope_restore_count
					ELSE [side].[local_update_peer_key]
					END
		ELSE [side].[scope_update_peer_key]
		END AS sync_update_peer_key
	,CASE 
		WHEN (
				[side].[create_scope_local_id] IS NULL
				OR [side].[create_scope_local_id] <> @sync_scope_local_id
				)
			THEN [side].[local_create_peer_timestamp]
		ELSE [side].[scope_create_peer_timestamp]
		END AS sync_create_peer_timestamp
	,CASE 
		WHEN (
				[side].[create_scope_local_id] IS NULL
				OR [side].[create_scope_local_id] <> @sync_scope_local_id
				)
			THEN CASE 
					WHEN ([side].[local_create_peer_key] > @sync_scope_restore_count)
						THEN @sync_scope_restore_count
					ELSE [side].[local_create_peer_key]
					END
		ELSE [side].[scope_create_peer_key]
		END AS sync_create_peer_key
FROM [core].[OrderProduct] [base](forceseek)
RIGHT JOIN [core].[OrderProduct_tracking] [side]
	ON [base].[Id] = [side].[Id]
WHERE (
		(
			[side].[OrderId] IN (
				SELECT [core].[Order].[Id]
				FROM [core].[Order]
				WHERE [core].[Order].[AccountId] IN (
						SELECT [express].[UserAccount].[AccountId]
						FROM [express].[UserAccount]
						WHERE [express].[UserAccount].[UserId] = @userid
						)
				)
			)
		OR (
			[side].[sync_row_is_tombstone] = 1
			AND (
				[side].[update_scope_local_id] = @sync_scope_local_id
				OR [side].[update_scope_local_id] IS NULL
				)
			AND [side].[OrderId] IS NULL
			)
		)
	AND (
		[side].[update_scope_local_id] IS NULL
		OR [side].[update_scope_local_id] <> @sync_scope_local_id
		OR (
			[side].[update_scope_local_id] = @sync_scope_local_id
			AND [side].[scope_update_peer_key] <> @sync_update_peer_key
			)
		)
	AND [side].[local_update_peer_timestamp] > @sync_min_timestamp

core.OrderProduct table has 25,159,177 rows with the following index:

clustered index on base.id

core.OrderProduct_Tracking table has 25,181,838 rows with the following index:

clustered index on side.id

nonclustered index on side.orderid, side.sync_row_is_tombstone, side.update_scope_local_id, side.local_update_peer_timestamp, side.id

Here's the execution plan:

I know the forceseek caused the clustered index seek and cost 88% but w/o that, the query runs super slow! Nightly, I have the index either reorg or rebuild and update its statistics. This query alone sucks up huge amount of memory and the wait for it is over 108 hrs in a 24 hr time frame.

if anyone can provide some guidance, that would be great!

Thanks,

Edited by Red8Rain 17 hours 33 minutes ago

February 4th, 2015 1:40pm

Why is it using the clustered index at all? rather than using a forceseek hint, I would be looking to see if there is an appropriate nonclustered index I could create where it would be natural for it to seek the index.

Free Windows Admin Tool Kit Click here and download it now

February 4th, 2015 1:54pm

Hi,

I would make a temp table for the information that nee to be filterd and then do the right outer join. Temp tables alsways improve the performance.

February 4th, 2015 1:55pm

How many records are returned? How many records are expected to match in the join? As the tables are joining on their clustered index keys, I would expect a MERGE join to be the fastest, that is, without parallelism being an option. In which case you then need to decide whether to filter the outer table before or after the join.

Free Windows Admin Tool Kit Click here and download it now

February 4th, 2015 1:58pm

hi robert,

The nonclustered index was originally not aligned with this query. I've adjusted it so that it would naturally line up but the plan still used index scan and ran longer, thus the use of the forceseek.

February 4th, 2015 1:59pm

Hi Soumen,

I have tried that as one of my refactoring work. but instead of storing it a temp table, I simply made it a subquery which generated a worker table and didn't provide the same results.

In any case, i will give the temp table a shot.

Edited by Red8Rain 17 hours 14 minutes ago

Free Windows Admin Tool Kit Click here and download it now

February 4th, 2015 2:00pm

How about changing u'r query from following:
[side].[OrderId] IN (
SELECT [core].[Order].[Id]
FROM [core].[Order]
WHERE [core].[Order].[AccountId] IN (
SELECT [express].[UserAccount].[AccountId]
FROM [express].[UserAccount]
WHERE [express].[UserAccount].[UserId] = @userid
)

to:
[side].[OrderId] IN (
SELECT [core].[Order].[Id]
FROM #temp1
)

where temp1 is:

SELECT [core].[Order].[Id]
into #temp1
FROM [core].[Order]
WHERE [core].[Order].[AccountId] IN (
SELECT [express].[UserAccount].[AccountId]
FROM [express].[UserAccount]
WHERE [express].[UserAccount].[UserId] = @userid

Maybe give it a try.

Also it maybe a good idea to use temp table OrderProduct_Tracking.
For example:
select *
into #side
from [core].[OrderProduct_tracking] [side]
WHERE (
(
[side].[OrderId] IN (
SELECT [core].[Order].[Id]
FROM #temp1
)
)
)
OR (
[side].[sync_row_is_tombstone] = 1
AND (
[side].[update_scope_local_id] = @sync_scope_local_id
OR [side].[update_scope_local_id] IS NULL
)
AND [side].[OrderId] IS NULL
)
)
AND (
[side].[update_scope_local_id] IS NULL
OR [side].[update_scope_local_id] <> @sync_scope_local_id
OR (
[side].[update_scope_local_id] = @sync_scope_local_id
AND [side].[scope_update_peer_key] <> @sync_update_peer_key
)
)
AND [side].[local_update_peer_timestamp] > @sync_min_timestamp

So the final query may look like:

...FROM [core].[OrderProduct] [base](forceseek)

RIGHT JOIN #side
	ON [base].[Id] = #side.[Id]
good luck Red.

Edited by Soumen Barua 16 hours 43 minutes ago

February 4th, 2015 2:11pm

Hi Mike,

the number of records will vary, sometimes it is 0, sometimes it is 62. Still it has to scan all 25m records each time to find out whether or not there are anything new.

here was the plan before i made the changes. it never used a merge join.

Free Windows Admin Tool Kit Click here and download it now

February 4th, 2015 2:24pm

The Merge join does appear in that plan! But unfortunately the reason you're forced into scanning the Order_tracking table is due to the "IS NULL" conditions in the WHERE clause. It can't use any index seeks as it has to check every record for the existence of NULL, as well as several OR conditions that stop index usage.

You need to try and filter the Order_tracking table first to reduce the record count before joining to the Order table. As mentioned above, a temp table is a good way to do this as it has its own statistics meaning the optimizer will recognise the low record count and most likely choose an index seek for the join without you needing to force it.

I'd recommend starting with "[side].[local_update_peer_timestamp] > @sync_min_timestamp" for the filtering as you know for sure this condition is required. Therefore make sure an index exists on [local_update_peer_timestamp]. Put the results in a temp table, then apply the other filtering to it and the join - you can experiment with the order of it all, but I'm making a guess here that this first condition will reduce the record count considerably.

February 5th, 2015 3:48am

Hi ,

Kindly perform two activities.

1. Update the statistics on referenced table with fullscan sampling.

Syntax - update statistics <table A> with fullscan

2. At the end of query use Hash join hint

Example - (option hash join)

This should improve the query performance.

Free Windows Admin Tool Kit Click here and download it now

February 5th, 2015 6:33am

Hi,

Fist of all remove this:

	[side].[OrderId] IN (
				SELECT [core].[Order].[Id]
				FROM [core].[Order]
				WHERE [core].[Order].[AccountId] IN (
						SELECT [express].[UserAccount].[AccountId]
						FROM [express].[UserAccount]
						WHERE [express].[UserAccount].[UserId] = @userid
						)
				)

from your code.

Replace it on the two joins. And it will be much more better.

(join to [express].[UserAccount] and join to [core].[Order])

Here is code:

select ....
FROM [core].[OrderProduct] [base](forceseek)
RIGHT JOIN [core].[OrderProduct_tracking] [side] ON [base].[Id] = [side].[Id]
left join [core].[Order] [Order] on [Order].[Id] = [side].[OrderId]
left join [express].[UserAccount] [UserAccount] on [UserAccount].[AccountId] = [Order].[AccountId] and [UserAccount].[UserId] = @userid
Where
(
    [side].[update_scope_local_id] IS NULL
 OR [side].[update_scope_local_id] <> @sync_scope_local_id
 OR (
    [side].[update_scope_local_id] = @sync_scope_local_id
      AND [side].[scope_update_peer_key] <> @sync_update_peer_key
    )
)
AND 
...

Edited by xjomanx 7 minutes ago Add SQL

February 5th, 2015 6:47am

Hi ,

Kindly perform two activities.

1. Update the statistics on referenced table with fullscan sampling.

Syntax - update statistics <table A> with fullscan

2. At the end of query use Hash join hint

Example - (option hash join)

This should improve the query performance.

Yes, update statistics, but avoid join hints where possible. A Hash join is NOT faster than a Merge join as a rule - it's only a better option if parallelism kicks in as the Hash join is the only join that can run in parallel.

In general, the query optimizer knows best. It's better to change your query to produce a better plan than it is to use hints, otherwise you're limiting the flexibility of the optimizer as the data changes over time.

The reason forcing an index seek worked in the original post was because the optimizer didn't know how many records would result from the filtering as the statistics maintained aren't sufficient for this. That's why putting the filtered dataset in a temp table would work as the optimizer would then know how many base records it's dealing with and can respond accordingly.

Free Windows Admin Tool Kit Click here and download it now

February 5th, 2015 6:49am

This topic is archived. No further replies will be accepted.