Looping is very slow on large data set

Hi All,

I need suggestion in optimizing the below looping scenario .

Sample data in the table , For easy understanding I kept only 4 columns actual source has  56 columns

Input :

#Final

 

 

 

RwNum

JobSource

RuleName

PackageType

1

Presubmission

GameRating

Xap

2

PostSubmission

GameRating

Xap

3

Presubmission

GameRating

NULL

4

Presubmission

TCRRule

Xap

5

PostSubmission

NULL

Xap

6

Submission

NULL

Xap

I need to iterate row by row in the above table  to compare the data with the rest of the table . i.e.  First get the data for RwNum =1 and compare with all the other rows in the table (i.e. Rwnum  2,3,4,5,6 )  to merge that data in below output format  and repeat the same process for all the rest of rows.

Expected  Output :

#Final

 

 

 

RwNum

JobSource

RuleName

PackageType

1

Presubmission

GameRating

Xap

2

PostSubmission

GameRating

Xap

4

Presubmission

TCRRule

Xap

6

Submission

NULL

Xap

In the final output , RwNum 1 is the merged result of  RwNum 1 & 3  . Similarly  RwNum 2 in the output is the merged result of  2 & 5   and other records remain As Is.

So the query I wrote is below  :

While (@TopRwNum <=6 )

Begin

While (@InnerRwNum <= 6)

Begin

SELECT

----

---- Columns list

-----

FROM          #final Fr

              JOIN #final Ne

              ON 

              Fr.RwNum =@TopRwNum AND Ne.RwNum =@InnerRwNum

              AND @TopRwNum <> @InnerRwNum     

Where

----

--- Conditional logic to compare  and merge the data 

---

The above query executed in 10 secs

The above query is working when the count of rows is small , but if the #final has  ~1000 rows then the no of times the above code has to iterate is 1000 *1000  ~= 1000000 times ( as there are outer and inner loops need to run for 1000 times each ) . Then the above while loop logic is taking around 20 mins

Can we optimize the above code to make  iteration faster ?  . Any ideas would be greatly appreciated .

August 27th, 2014 10:05pm

You need to use set-based solution. WHILE looping is not scalable.

It appears that OVER PARTITION BY windowing functions may do the job for you:

http://www.sqlusa.com/bestpractices2005/overpartitionby/

Free Windows Admin Tool Kit Click here and download it now
August 27th, 2014 10:13pm

How do you determine which all rows to be merged together? is it based on just packagetype value?
August 27th, 2014 10:21pm

Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Learn how to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect. 

This is minimal polite behavior on SQL forums. Why did you fail to read the front of the forum? Now we have to copy the data out of your ASCII pictures and put it into DDL

>> need suggestion in optimizing the below looping scenario.<<

SQL is a declarative language, so it does not have loops; that was COBOL and FORTRAN. SQL is set-based, so it does not have row numbers; that was magnetic tapes. We also hate temp tables. 

>> Sample data in the table, For easy understanding I kept only 4 columns actual source has 56 columns <<

That is not the immediate problem. We have no spec (what does merge mean in your world? In SQL it is an upsert operation (update/insert). Why do you think records are rows? More magnetic tape mindset.  But I will bet that the width is the result of more bad non-RDBMS coding. 

This posting makes no sense. Would you like to try again?  And follow Netiquette? 
Free Windows Admin Tool Kit Click here and download it now
August 27th, 2014 11:13pm

Please try this:

CREATE TABLE #Final ( RwNum INT PRIMARY KEY, JobSource NVARCHAR(100) , RuleName NVARCHAR(100) ,PackageType NVARCHAR(100) )
INSERT #Final
        ( RwNum ,
          JobSource ,
          RuleName ,
          PackageType
        )
VALUES  ( 1 , N'Presubmission' , N'GameRating' , N'Xap'  ), ( 2 , N'PostSubmission' , N'GameRating' , N'Xap'  ), ( 3 , N'Presubmission' , N'GameRating' , NULL  ), 
		( 4 , N'Presubmission' , N'TCRRule' , N'Xap'  ), ( 5 , N'PostSubmission' , NULL , N'Xap'  ), ( 6 , N'Submission' , NULL , N'Xap'  ) ;


SELECT * 
FROM #Final a

SELECT a.RwNum AS aId, b.RwNum AS bId
FROM #Final a
JOIN #Final b 
ON ( a.JobSource = b.JobSource AND a.RuleName = b.RuleName )
OR ( a.JobSource = b.JobSource AND a.PackageType = b.PackageType AND a.RuleName IS NULL )
WHERE a.RwNum <> b.RwNum



SELECT * 
FROM #Final AS f
WHERE NOT EXISTS 
	( SELECT *
	  FROM #Final AS a
	  JOIN #Final AS b 
		  ON ( a.JobSource = b.JobSource AND a.RuleName = b.RuleName )
		  OR ( a.JobSource = b.JobSource AND a.PackageType = b.PackageType AND a.RuleName IS NULL )
	  WHERE a.RwNum <> b.RwNum
		AND f.RwNum = IIF(a.RwNum > b.RwNum, a.RwNum , b.RwNum) )

August 28th, 2014 12:22am

You can simply delete not wanted RwNum(s).

DELETE 
FROM #Final 
WHERE RwNum IN (
				SELECT IIF(a.RwNum > b.RwNum, a.RwNum , b.RwNum)
				FROM #Final a
				JOIN #Final b 
				ON ( a.JobSource = b.JobSource AND a.RuleName = b.RuleName )
				OR ( a.JobSource = b.JobSource AND a.PackageType = b.PackageType AND a.RuleName IS NULL )
				WHERE a.RwNum <> b.RwNum
				)

Free Windows Admin Tool Kit Click here and download it now
August 28th, 2014 12:32am

Hi All ,

Thanks for your replies .Sorry if haven't followed the forum rules , please forgive me . I kept only images as they help to understand the scenario better . Now here i am posting the DDL scripts and also the expected output with some more columns.( I have used table variables here)

DECLARE  @Tab TABLE (
RwNum  INT,
JobParentID UNIQUEIDENTIFIER,JobSource      NVARCHAR(MAX), 
PackageType NVARCHAR(MAX), 
UpdateType  NVARCHAR(MAX), 
IsAutoPassed NVARCHAR(256),
IsCanceled NVARCHAR(256),
IsSkipped NVARCHAR(256),
Result NVARCHAR(256),     
Fired NVARCHAR(256),      
RuleName NVARCHAR(256)
) 
 
INSERT INTO @Tab 
SELECT 1,'7ca42851-c3d2-42da-b5b9-d40392ae24fb','PreSubmissionSource','Xap',         'FullUpdate','','','FALSE','Pass','FALSE','RuleValidationResult^XboxLive'
UNION 
SELECT 2,'7ca42851-c3d2-42da-b5b9-d40392ae24fb','PostSubmission','Xap',         'PartialUpdate','FALSE','','TRUE','Failed','FALSE','RuleValidationResult^GameCategoryNameChange'
UNION 
SELECT 3,'7ca42851-c3d2-42da-b5b9-d40392ae24fb','PreSubmissionSource','Xap',         '','TRUE','TRUE','','Pass','FALSE','RuleValidationResult^XboxLive'
UNION 
SELECT 4,'7ca42851-c3d2-42da-b5b9-d40392ae24fb','PreSubmissionSource','Xap',         '','TRUE','TRUE','','Pass','FALSE','RuleValidationResult^CumulativeDownload'
UNION 
SELECT 5,'7ca42851-c3d2-42da-b5b9-d40392ae24fb','PostSubmission','Xap',         'PartialUpdate','FALSE','','TRUE','Failed','FALSE','PreinstalledPackage'
UNION 
SELECT 6,'7ca42851-c3d2-42da-b5b9-d40392ae24fb','PreSubmissionSource','Xap',         'PartialUpdate','TRUE','TRUE','','Pass','FALSE','RuleValidationResult^CumulativeDownload'
UNION
SELECT 7,'7ca42851-c3d2-42da-b5b9-d40392ae24fb','PreSubmissionSource','Xap',         'FullUpdate','','','','Pass','','RuleValidationResult^XboxLive'
UNION 
SELECT 1,'2004235d-af05-4e29-ab8d-50b80a088dd4','PreSubmissionSource','Xap',         'FullUpdate','TRUE','TRUE','','Pass','FALSE','RuleValidationResult^CumulativeDownload'
UNION 
SELECT 2,'2004235d-af05-4e29-ab8d-50b80a088dd4','PreSubmissionSource','Xap',         'FullUpdate','TRUE','','FALSE','','FALSE','RuleValidationResult^CumulativeDownload'
 
SELECT * FROM @Tab  ORDER BY 2 DESC ,1  ASC

Original source table doesn't RwNum column . In order for me to iterate row by row on the above table i have created "RwNum" column . This column has unique ID for every row of a JobIDParent. RwNum column is generated as below 

ROW_NUMBER() OVER( ORDER BY JobIDParent) AS  RwNum,

The output should be below  :

DECLARE  @Output TABLE (
RwNum  INT,
JobParentID UNIQUEIDENTIFIER,JobSource      NVARCHAR(MAX), 
PackageType NVARCHAR(MAX), 
UpdateType  NVARCHAR(MAX), 
IsAutoPassed NVARCHAR(256),
IsCanceled NVARCHAR(256),
IsSkipped NVARCHAR(256),
Result NVARCHAR(256),     
Fired NVARCHAR(256),      
RuleName NVARCHAR(256)
) 
 
INSERT INTO @Output 
SELECT 1,'7ca42851-c3d2-42da-b5b9-d40392ae24fb',    'PreSubmissionSource',    'Xap',         'FullUpdate','TRUE','TRUE','FALSE','Pass','FALSE','RuleValidationResult^XboxLive'
UNION 
SELECT 2,'7ca42851-c3d2-42da-b5b9-d40392ae24fb',    'PostSubmission', 'Xap',         'PartialUpdate','FALSE','','TRUE','Failed','FALSE','RuleValidationResult^GameCategoryNameChange'
UNION 
SELECT 5,'7ca42851-c3d2-42da-b5b9-d40392ae24fb',    'PostSubmission', 'Xap',         'PartialUpdate','FALSE','','TRUE','Failed','FALSE','PreinstalledPackage'
UNION 
SELECT 4,'7ca42851-c3d2-42da-b5b9-d40392ae24fb',    'PreSubmissionSource',    'Xap',         'PartialUpdate','TRUE','TRUE','','Pass','FALSE','RuleValidationResult^CumulativeDownload'
UNION 
SELECT 1,'2004235d-af05-4e29-ab8d-50b80a088dd4',    'PreSubmissionSource',    'Xap',         'FullUpdate','TRUE','TRUE','FALSE','Pass','FALSE','RuleValidationResult^CumulativeDownload'
 
SELECT * FROM @Output  ORDER BY 2 DESC ,1  ASC

Merge rules to generate the above Output :

All the below rules should be satisfied to merge two rows 

1) We only need to merge the data that is related to the same JobIDParent

2) Data in column for any two merging rows should not conflict . I.e. Data in a column of a merging row should not be different to the data in the same column for the other merging row(only if both rows are having not NULL value in that column )

3) we can merge two rows ,only if the data in all the columns satisfy any of the below cases 

case i) If the data in the column of a row is having null/EmptySpace & the data in the same column for the other row is a valid value i.e. non empty or not null

case ii) Data in a column of a row is equal to the data in the same column of the other row .

Output analysis by applying above rules:

In the @Output  analysis for JobIDParent '7CA42851-C3D2-42DA-B5B9-D40392AE24FB'

i)RwNum =1 is formed as a result of merge using the above rules from RwNum 3 & RwNum 7 .First the RowNum 1 and 3 are merged and the output  of these two rows are merged with RwNum 7

ii) RwNum =2 is not merged with any of the rows in fact table , as the no row in the table satisfy all the merge rules with RwNum =2

iii) RwNum = 4 is formed as a result of merge using the above rules from RwNum 6 .

iv) RwNum =5 is not merged with any of the rows in fact table , as the no row in the table satisfy all the merge rules with RwNum =5

In the @Output  analysis for JobIDParent '7CA42851-C3D2-42DA-B5B9-D40392AE24FB'

i)RwNum =1 is formed as a result of merge using the above rules from RwNum 2.

In this way we want the rows to be merged on the table

Sorry if my earlier post is not understandable .

Thanks in advance.

August 28th, 2014 4:52am

WITH cte AS(
    SELECT  [RwNum]
        ,[JobSource]
        ,[RuleName]
        ,[PackageType]
        ,ROW_NUMBER() OVER(PARTITION BY [JobSource],[RuleName] ORDER BY [RwNum]) as rowID
    FROM [dbo].[Final]
)
SELECT *
FROM cte
WHERE rowID=1
ORDER BY [RwNum]
Free Windows Admin Tool Kit Click here and download it now
August 28th, 2014 7:21pm

Hi Saengthanou Chaengsavang ,

Thanks for you reply . Please refer my last reply where it has the complete scenario that has to be considered for the merge .The query you gave to me doesn't work for all my scenarios  . 

Please refer my last reply on this thread for complete picture on merging data.

August 28th, 2014 8:23pm

Using your example data (thanks!) I put this together.

It should show you the pairings:

SELECT  t.RwNum, t.JobParentID, t.JobSource, t.PackageType, t.UpdateType, t.IsAutoPassed, t.IsCanceled, t.IsSkipped, t.Result, t.Fired, t.RuleName, t2.*
  FROM @Tab t
    LEFT OUTER JOIN @tab t2 
	  ON t.jobParentID = t2.JobParentID
	  AND t.RwNum <> t2.RwNum
	  AND (t.JobSource = t2.JobSource		OR (NULLIF(t.JobSource		,'') IS NULL	AND NULLIF(t2.JobSource		,'') IS NOT NULL) OR (NULLIF(t2.JobSource		,'') IS NULL AND NULLIF(t.JobSource		,'') IS NOT NULL))
	  AND (t.PackageType = t2.PackageType	OR (NULLIF(t.PackageType	,'') IS NULL	AND NULLIF(t2.PackageType	,'') IS NOT NULL) OR (NULLIF(t2.PackageType		,'') IS NULL AND NULLIF(t.PackageType	,'') IS NOT NULL))
	  AND (t.UpdateType = t2.UpdateType		OR (NULLIF(t.UpdateType		,'') IS NULL	AND NULLIF(t2.UpdateType	,'') IS NOT NULL) OR (NULLIF(t2.UpdateType		,'') IS NULL AND NULLIF(t.UpdateType	,'') IS NOT NULL))
	  AND (t.IsAutoPassed = t2.IsAutoPassed	OR (NULLIF(t.IsAutoPassed	,'') IS NULL	AND NULLIF(t2.IsAutoPassed	,'') IS NOT NULL) OR (NULLIF(t2.IsAutoPassed	,'') IS NULL AND NULLIF(t.IsAutoPassed	,'') IS NOT NULL))
	  AND (t.IsCanceled = t2.IsCanceled		OR (NULLIF(t.IsCanceled		,'') IS NULL	AND NULLIF(t2.IsCanceled	,'') IS NOT NULL) OR (NULLIF(t2.IsCanceled		,'') IS NULL AND NULLIF(t.IsCanceled	,'') IS NOT NULL))
	  AND (t.IsSkipped = t2.IsSkipped		OR (NULLIF(t.IsSkipped		,'') IS NULL	AND NULLIF(t2.IsSkipped		,'') IS NOT NULL) OR (NULLIF(t2.IsSkipped		,'') IS NULL AND NULLIF(t.IsSkipped		,'') IS NOT NULL))
	  AND (t.result = t2.Result				OR (NULLIF(t.Result			,'') IS NULL	AND NULLIF(t2.Result		,'') IS NOT NULL) OR (NULLIF(t2.Result			,'') IS NULL AND NULLIF(t.Result		,'') IS NOT NULL))
	  AND (t.Fired = t2.Fired				OR (NULLIF(t.Fired			,'') IS NULL	AND NULLIF(t2.Fired			,'') IS NOT NULL) OR (NULLIF(t2.Fired			,'') IS NULL AND NULLIF(t.Fired			,'') IS NOT NULL))
	  AND (t.RuleName = t2.RuleName			OR (NULLIF(t.RuleName		,'') IS NULL	AND NULLIF(t2.RuleName		,'') IS NOT NULL) OR (NULLIF(t2.RuleName		,'') IS NULL AND NULLIF(t.RuleName		,'') IS NOT NULL))
  ORDER BY 2 DESC ,1  ASC

Free Windows Admin Tool Kit Click here and download it now
August 28th, 2014 8:47pm

Hi Patrick ,

Thanks for a lot for  your reply. I really appreciate your time .

Your query gets all the rows from the @Tab because of left outer join . As per the data in @output table variable in my earlier reply, in the output i need only merged rows of JobParentID '7ca42851-c3d2-42da-b5b9-d40392ae24fb'  which are  RwNum =1 ,RwNum =2,RwNum =5 and RwNum =4  as shown in my above example.

and  for '2004235d-af05-4e29-ab8d-50b80a088dd4' i only RwNum =1 which is also a merged row on RwNum =1 and RwNum= 2.

Thank you once again .

August 28th, 2014 9:44pm

Well, yes, because I wanted you to be able to evaluate the result set.

I kinda figured you would be able to modify it after establishing it's doing what's needed.

Free Windows Admin Tool Kit Click here and download it now
August 28th, 2014 10:15pm

First let's use NULL instead of ''.

  UPDATE [test].[dbo].[test] SET [UpdateType]=null WHERE [UpdateType]='';
  UPDATE [test].[dbo].[test] SET [IsAutoPassed]=null WHERE [IsAutoPassed]='';
  UPDATE [test].[dbo].[test] SET [IsCanceled]=null WHERE [IsCanceled]='';
  UPDATE [test].[dbo].[test] SET [IsSkipped]=null WHERE [IsSkipped]='';
  UPDATE [test].[dbo].[test] SET [Result]=null WHERE [Result]='';
  UPDATE [test].[dbo].[test] SET [Fired]=null WHERE [Fired]='';

1- We need a starting point, so we do our first select and assign 1 as starting.

2- We build our recursive query to replace NULL value.

3- We build our new RowID2 and take the row we need.

WITH cte AS(
	SELECT [RwNum]
		,[JobParentID]
		,[JobSource]
		,[PackageType]
		,[UpdateType]
		,[IsAutoPassed]
		,[IsCanceled]
		,[IsSkipped]
		,[Result]
		,[Fired]
		,[RuleName]
		,ROW_NUMBER() OVER(PARTITION BY [JobParentID],[JobSource],[RuleName] ORDER BY [RwNum]) as RowID
		,1 as starting
	FROM [test].[dbo].[test]
),cte2 AS(
	SELECT
		cte.[RwNum]
		,cte.[JobParentID]
		,cte.[JobSource]
		,cte.[PackageType]
		,cte.[UpdateType]
		,cte.[IsAutoPassed]
		,cte.[IsCanceled]
		,cte.[IsSkipped]
		,cte.[Result]
		,cte.[Fired]
		,cte.[RuleName]
		,cte.rowID
		,starting
	FROM cte
	UNION ALL
	SELECT
		cte2.[RwNum]
		,cte2.[JobParentID]
		,cte2.[JobSource]
		,cte2.[PackageType]
		,COALESCE(cte.[UpdateType],cte2.UpdateType) as UpdateType
		,COALESCE(cte.[IsAutoPassed],cte2.[IsAutoPassed]) as [IsAutoPassed]
		,COALESCE(cte.[IsCanceled],cte2.[IsCanceled]) as [IsCanceled]
		,COALESCE(cte.[IsSkipped],cte2.[IsSkipped]) as [IsSkipped]
		,COALESCE(cte.[Result],cte2.[Result]) as [Result]
		,COALESCE(cte.[Fired],cte2.[Fired]) as [Fired]
		,cte2.[RuleName]
		,cte.rowID
		,cte2.starting+1
	FROM cte2
	INNER JOIN cte ON cte.[JobParentID]=cte2.[JobParentID] AND cte.[JobSource]=cte2.[JobSource] AND cte.[RuleName]=cte2.[RuleName] AND cte2.RowID=cte2.starting
), cte3 AS(
SELECT
	* 
	,ROW_NUMBER() OVER(PARTITION BY [JobParentID],[JobSource],[RuleName] ORDER BY starting DESC) as RowID2
FROM cte2
WHERE rowID=1
)

SELECT 
	[RwNum]
    ,[JobParentID]
    ,[JobSource]
    ,[PackageType]
    ,[UpdateType]
    ,[IsAutoPassed]
    ,[IsCanceled]
    ,[IsSkipped]
    ,[Result]
    ,[Fired]
    ,[RuleName] FROM cte3 WHERE RowID2=1 ORDER BY RwNum

Result:

RwNum JobParentID JobSource PackageType UpdateType IsAutoPassed IsCanceled IsSkipped Result Fired RuleName
1 2004235D-AF05-4E29-AB8D-50B80A088DD4 PreSubmissionSource Xap FullUpdate TRUE TRUE FALSE Pass FALSE RuleValidationResult^CumulativeDownload
1 7CA42851-C3D2-42DA-B5B9-D40392AE24FB PreSubmissionSource Xap FullUpdate TRUE TRUE FALSE Pass FALSE RuleValidationResult^XboxLive
2 7CA42851-C3D2-42DA-B5B9-D40392AE24FB PostSubmission Xap PartialUpdate FALSE NULL TRUE Failed FALSE RuleValidationResult^GameCategoryNameChange
4 7CA42851-C3D2-42DA-B5B9-D40392AE24FB PreSubmissionSource Xap PartialUpdate TRUE TRUE NULL Pass FALSE RuleValidationResult^CumulativeDownload
5 7CA42851-C3D2-42DA-B5B9-D40392AE24FB PostSubmission Xap PartialUpdate FALSE NULL TRUE Failed FALSE PreinstalledPackage

August 28th, 2014 10:35pm

>> Sorry if haven't followed the forum rules, please forgive me. I kept only images as they help to understand the scenario better. Now here I am posting the DDL scripts and also the expected output with some more columns.<<

Okay, you get off with a warning. 

>> I have used table variables here <<

NO!! How does it feel to be told by someone who helped write the SQL Standards that everything you are doing with it is wrong? Yes, it is that bad. 

We need keys! We need constraints! What you did post is total garbage! You live in a magical world where everything is a Chinese UNICODE string with the Heart Sutra in it (247 Chinese characters). 

We do not use flags in RDBMS. That was assembly language. A column is a scalar value; do you know what a scale is? 

We do not use UNIQUEIDENTIFIER, or GUID. The G is for Global and tables have local domain specify identifiers. 

We do not have a physical row number in RDBMS; that was magnetic tape files! 

This is such a  mess, I wil not even try. No matter how far you have traveled down the wrong road, turn around!:-  Turkish Proverb. 

Free Windows Admin Tool Kit Click here and download it now
August 29th, 2014 12:40am

>> Sorry if haven't followed the forum rules, please forgive me. I kept only images as they help to understand the scenario better. Now here I am posting the DDL scripts and also the expected output with some more columns.<<

Okay, <This section was deleted!> 

>> I have used table variables here <<

NO!! <This section was deleted!>  

We need keys! We need constraints! <This section was deleted!> 

We do not have a physical row number in RDBMS;

<This section was deleted!> 
  • Edited by pituachModerator 13 hours 17 minutes ago CELLO's standard answer, with lot of rudeness using offensive words, which were deleted.
August 29th, 2014 12:40am

Thanks Patrick for your response.  The result set from your query would give me correct result if it would only give me merged result set. I tried many ways to get only the merged data out from the query , but all of them are failing in one or other way.Can you let me know a better way only to get the desired rows in the output.

Free Windows Admin Tool Kit Click here and download it now
August 29th, 2014 9:09pm

Question: Can there ever be more than two rows you have to merge in a single data set?

I have a simplified example. You will have to forgive me for not using your exact table structure for this example. I just wanted something simple to illustrate the concept.

you should be able to adapt this approach to your solu

August 30th, 2014 7:08am

>> Sorry if haven't followed the forum rules, please forgive me. I kept only images as they help to understand the scenario better. Now here I am posting the DDL scripts and also the expected output with some more columns.<<

Okay, <This section was deleted!> 

>> I have used table variables here <<

NO!! <This section was deleted!>  

We need keys! We need constraints! <This section was deleted!> 

We do not have a physical row number in RDBMS;

<This section was deleted!> 

@CELCO

What a shame! and What a pity!

Your manners prevent you from communicating and transmiting the great knowledge you have on R

Free Windows Admin Tool Kit Click here and download it now
August 31st, 2014 3:56am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics