behavior of ssis lookup component
thx folks, sorry about the buffer reference. I didnt want it to be part of the conversation. Anyway, beside a sql query with an inner join placed in an oledb source component or a proc in a sql command component etc, do any of ssis' components do the equivalent of an inner join efficiently?
May 26th, 2012 8:04am

The MERGE JOIN can do an Inner Join. Efficiently? That's something else :) This component needs it's data sorted. If your sources are databases, you can sort the data in the query using ORDER BY. In that case, the performance hit can be dealt with. However, if you import for example from flat files, you must sort in the SSIS package itself and in that case I wouldn't even bother. Performance will suffer as all data needs to be loaded into memory.MCTS, MCITP - Please mark posts as answered where appropriate.
Free Windows Admin Tool Kit Click here and download it now
May 26th, 2012 8:14am

Lookup returns all the rows which matches (or mismatches) based on the lookup key. the operation is done on all the rows of table and in output all the matching (or mismatching) rows are returned.
May 26th, 2012 8:23am

Lookup returns all the rows which matches (or mismatches) based on the lookup key. the operation is done on all the rows of table and in output all the matching (or mismatching) rows are returned. What? No! Todd and I have been explaining in this thread that this is not the case. Lookup returns only the first match, certainly not all the matches. The operation is only performed on all the rows of the table if you select the table from the dropdown menu (bad practice) or if you use a SQL statement without a filter.MCTS, MCITP - Please mark posts as answered where appropriate.
Free Windows Admin Tool Kit Click here and download it now
May 26th, 2012 8:30am

Although it's probably not stated clearly, only the first match is stored in the lookup table. Internally, the reference data is stored in a hashtable type construct. Two rows with the same key would have the same key in the hashtable, and would not be stored. There's absolutely no upside to storing duplicate keys in full cache mode, as the component is only ever going to return one match. Koen, my comment about "first" is in regards to how arbitrary that decision is by the engine. There is no guarantee anywhere that the algorithm will select the row you think it "should" from the set of matching rows, should there be more than one. Even observations or common sense that show it is the first row appearing in the rowset - and the rowset can be manually specified with an ORDER BY in it - can not be relied upon going forward. @Rohitz - prove it. Talk to me now on
May 26th, 2012 11:13am

Hi. We run STD 2008 and a peer asked this today. Does the lookup component stop after the first match for a given buffer row as he believes? And if so, is there a way to get the lookup component to act like an inner join so that multiple matches could be returned on any given row?
Free Windows Admin Tool Kit Click here and download it now
May 26th, 2012 4:28pm

The Lookup operates on the whole set of records, not just the current buffer. And multiple matches do get produced automatically, even duplicates that oftentimes need to be removed.Arthur My Blog
May 26th, 2012 4:35pm

Todd, thanks for the clarification.MCTS, MCITP - Please mark posts as answered where appropriate.
Free Windows Admin Tool Kit Click here and download it now
May 26th, 2012 4:53pm

Arthur, you are incorrect. The Lookup component only supplies ONE match per row in the buffer. (The mention of the buffer in this scenario is irrelevant.) If a row in the data flow happens to match more than one row in the reference data set, only ONE match is made. Which row in the reference set is chosen is not specified, and it's behaviour to select one is not guaranteed. "First" is not described. There is no way to configure the Lookup component to return multiple matches. The Lookup component is a synchronous component, which means it does NOT add or remove rows from the data flow, it only modifies the rows that pass through it. As such, it can not add rows if more than one reference row matches your data flow row. Talk to me now on
May 26th, 2012 5:56pm

Todd correct me if i am wrong please Lookup in FULL CACHE MODE: All the rows including duplicates will be cached in the preexecute phase. LOOKUP In partial CACHE MODE: ROW by row Execution, if a row not found in Partial Cache then Dtabase hot and the row added to cache, if row found database is not hit so in this case there are no Duplicates, THe comparison always happen with the Lookup Key defined http://blogs.msdn.com/b/mattm/archive/2008/10/18/lookup-cache-modes.aspx Abhinav
Free Windows Admin Tool Kit Click here and download it now
May 27th, 2012 12:48am

@AB82: that seems correct to me. @Todd: the MSDN page does mention first: If there are multiple matches in the reference table, the Lookup transformation returns only the first match returned by the lookup query http://msdn.microsoft.com/en-us/library/ms141821.aspxMCTS, MCITP - Please mark posts as answered where appropriate.
May 27th, 2012 5:08am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics