How to use the Lookup DataFlow Task - to Simulate Band Lookups (Ranged Lookups?)
Hi Todd I have tried to adapt your code to my scenario but I cannot get the result i'm looking for. please see if you can spot what i'm doing wrong. 1) my RangeItem class looks like this public struct RangeItem : IComparable<Nullable<decimal>>, IComparable<RangeItem> { public int BandKey; public string BandTypeName; public Nullable<decimal> MinBandValue; public Nullable<decimal> MaxBandValue; public Nullable<decimal> LookupValue; public RangeItem(int bandKey, string bandTypeName, Nullable<decimal> minBandValue, Nullable<decimal> maxBandValue) { this.BandKey = bandKey; this.BandTypeName = bandTypeName; this.MinBandValue = minBandValue; this.MaxBandValue = maxBandValue; this.LookupValue = new Nullable<decimal>(); } public RangeItem(string bandTypeName, Nullable<decimal> lookupValue) { this.BandKey = -1; this.BandTypeName = bandTypeName; this.MinBandValue = new Nullable<decimal>(); this.MaxBandValue = new Nullable<decimal>(); this.LookupValue = lookupValue; } #region IComparable Members public int CompareTo(RangeItem otherItem) { return CompareTo(otherItem.LookupValue); } public int CompareTo(Nullable<decimal> value) { if (!value.HasValue) { if (this.MinBandValue == null && this.MaxBandValue == null) return 0; else return 1; } else { if (this.MinBandValue == null) { if (this.MaxBandValue >= value) { return 0; } else return 1; } else if (this.MaxBandValue == null) { if (this.MinBandValue <= value) { return 0; } else return 1; } else { if (this.MinBandValue <= value && this.MaxBandValue >= value) { return 0; } else return 1; } } } #endregion } 2) my main code - just some test code to search for the band range of one value (does not work) returns index = -1 .... SqlConnection connection = null; Dictionary<string, List<RangeItem>> Cache = new Dictionary<string, List<RangeItem>>(); const string query = "SELECT BandKey, BandType, MinBandValue, MaxBandValue FROM dbo.DimBand WITH(NOLOCK) WHERE RowIsCurrent = 'Y' ORDER BY BandType, MinBandValue, MaxBandValue"; public void Main() { connection = (SqlConnection)Dts.Connections["cn"].AcquireConnection(null); FillCache(); List<RangeItem> ranges = null; Cache.TryGetValue("MonthlyIncomeBand", out ranges); int index = 0; if (ranges.Count > 0) { index = ranges.BinarySearch(new RangeItem("MonthlyIncomeBand", (decimal)15567.25)); if (index < 0) { index = ~index; } } // TODO: Add your code here Dts.TaskResult = (int)ScriptResults.Success; } private void FillCache() { SqlCommand command = new SqlCommand(query, connection); SqlDataReader reader = command.ExecuteReader(); try { string lastBandType = string.Empty; List<RangeItem> ranges = null; while (reader.Read()) { int bandKey = reader.GetInt32(0); string bandType = reader.GetString(1); Nullable<decimal> minBandValue; Nullable<decimal> maxBandValue; minBandValue = reader.IsDBNull(2) ? new Nullable<decimal>() : new Nullable<decimal>(reader.GetDecimal(2)); maxBandValue = reader.IsDBNull(3) ? new Nullable<decimal>() : new Nullable<decimal>(reader.GetDecimal(3)); if (lastBandType != bandType) { if (!string.IsNullOrEmpty(lastBandType)) { //cache the current list of band type ranges Cache.Add(lastBandType, ranges); } lastBandType = bandType; ranges = new List<RangeItem>(); } ranges.Add(new RangeItem(bandKey, bandType, minBandValue, maxBandValue)); } if (!string.IsNullOrEmpty(lastBandType) && ranges != null && ranges.Count > 0) { Cache.Add(lastBandType, ranges); } } finally { reader.Close(); } } I.W Coetzer
February 26th, 2011 5:34am

Hi, Currently I have an ETL process written in SSIS that makes use of a script task to lookup band ranges. This is very slow and I was wondering what strategy / technique there is that I can do the same using maybe a clever derived column? and a Lookup component. The example. Let's say I have records coming in via OLE Source that contains say Monthly Incomes of our customers, now these Incomes will be exact - here are examples: R15,025.25 R24,599.25 etc. We have a dimension in our warehouse called DimBand that contains type of bands and min / max values for these. So we may have a band type called "MonthlyIncomeBand" with the following possibiities: 0 to 20,000 20,000 to 50,000 etc. Where the min and max ranges would be: 0.00 - 20,000.49 20,000.50 - 50,000.49 etc. Now I can do this is a script task by retrieving all the possible band types into a static data set and then for each record selecting the rows where the value falls between a range and return the key of that range. but this is slow. (i also cater for 'null' monthly incomes with an IF condition in the script task, to return the key of the 'unknown' band range.) Any ideas of how i can ditch my script task? I would prefer doing something in SSIS and not move over to a stored procedure for this task. Thanks, IanI.W Coetzer
Free Windows Admin Tool Kit Click here and download it now
February 26th, 2011 6:25am

Got IT WORKING! MUCH MUCH Faster now! 1) the RangeItem structure: public struct RangeItem : IComparable<RangeItem> { public int BandKey; public string BandTypeName; public Nullable<decimal> MinBandValue; public Nullable<decimal> MaxBandValue; #region Constructors public RangeItem(int bandKey, string bandTypeName, Nullable<decimal> minBandValue, Nullable<decimal> maxBandValue) { this.BandKey = bandKey; this.BandTypeName = bandTypeName; this.MinBandValue = minBandValue; this.MaxBandValue = maxBandValue; } #endregion #region IComparable Members public int CompareTo(RangeItem other) { if (this.MinBandValue < other.MinBandValue && this.MaxBandValue < other.MaxBandValue) return -1; if (this.MinBandValue > other.MinBandValue && this.MaxBandValue > other.MaxBandValue) return 1; if (this.MinBandValue == other.MinBandValue && this.MaxBandValue == other.MaxBandValue) return 0; throw new ArgumentException("Incomparable values (overlapping)"); } public int CompareTo(Nullable<decimal> value) { if (value == null) { if (this.MinBandValue == null && this.MaxBandValue == null) return 0; else return -1; } else { if (Math.Round(value.Value,2) < this.MinBandValue) return 1; if (Math.Round(value.Value, 2) > this.MaxBandValue) return -1; return 0; } } #endregion } 2) the PreExecute code that fills up the cache + the acquire connections (note it uses an ado.net connection manager from main pckg) IDTSConnectionManager100 connectionManager; SqlConnection connection = null; Dictionary<string, List<RangeItem>> Cache = new Dictionary<string, List<RangeItem>>(); const string query = "SELECT BandKey, BandType, MinBandValue, MaxBandValue FROM dbo.DimBand WITH(NOLOCK) WHERE RowIsCurrent = 'Y' ORDER BY BandType, MinBandValue, MaxBandValue"; public override void PreExecute() { base.PreExecute(); SqlCommand command = new SqlCommand(query, connection); SqlDataReader reader = command.ExecuteReader(); try { string lastBandType = string.Empty; List<RangeItem> ranges = null; while (reader.Read()) { int bandKey = reader.GetInt32(0); string bandType = reader.GetString(1); Nullable<decimal> minBandValue; Nullable<decimal> maxBandValue; minBandValue = reader.IsDBNull(2) ? new Nullable<decimal>() : new Nullable<decimal>(reader.GetDecimal(2)); maxBandValue = reader.IsDBNull(3) ? new Nullable<decimal>() : new Nullable<decimal>(reader.GetDecimal(3)); if (lastBandType != bandType) { if (!string.IsNullOrEmpty(lastBandType)) { //cache the current list of band type ranges Cache.Add(lastBandType, ranges); } lastBandType = bandType; ranges = new List<RangeItem>(); } ranges.Add(new RangeItem(bandKey, bandType, minBandValue, maxBandValue)); } if (!string.IsNullOrEmpty(lastBandType) && ranges != null && ranges.Count > 0) { Cache.Add(lastBandType, ranges); } } finally { reader.Close(); } } public override void AcquireConnections(object Transaction) { connectionManager = this.Connections.LookupConnection; connection = (SqlConnection)connectionManager.AcquireConnection(null); } 3) The GetBandKey function that will be called by the primary Input0_ProcessInputRow function ... private int GetBandKey(string bandTypeName, Nullable<decimal> value) { //firstly, retrieve the ranges from which a band must be chosen List<RangeItem> ranges = null; Cache.TryGetValue(bandTypeName, out ranges); //now search through this subset of ranges for the desired band int index = -1; if (ranges.Count > 0) { int min = 0; int max = ranges.Count - 1; while (min <= max) { int mid = (min + max) / 2; int comparison = ranges[mid].CompareTo(value); if (comparison == 0) { index = mid; break; } if (comparison < 0) min = mid + 1; else if (comparison > 0) max = mid - 1; } } if (index >= 0) { return ranges[index].BandKey; } else return -1; } 4) The Input0_ProcessInputRow function public override void Input0_ProcessInputRow(Input0Buffer Row) { Output0Buffer.AddRow(); Output0Buffer.AgentCode = Row.AgentCode_IsNull ? "Unknown" : Row.AgentCode; Output0Buffer.AvailableIncome = Row.AvailableIncome_IsNull ? 0 : Row.AvailableIncome; Output0Buffer.BKBatchID = Row.BKBatchID_IsNull ? -1 : Row.BKBatchID; Output0Buffer.BranchEntityID = Row.BranchEntityID_IsNull ? "Unknown" : Row.BranchEntityID; Output0Buffer.ClientEntityID = Row.ClientEntityID_IsNull ? "Unknown" : Row.ClientEntityID; Output0Buffer.ClientIDNumber = Row.ClientIDNumber_IsNull ? "Unknown" : Row.ClientIDNumber; Output0Buffer.Commitments = Row.Commitments_IsNull ? 0 : Row.Commitments; Output0Buffer.ExpenseMatrixTotal = Row.ExpenseMatrixTotal_IsNull ? 0 : Row.ExpenseMatrixTotal; Output0Buffer.ExposureAmount = Row.ExposureAmount_IsNull ? 0 : Row.ExposureAmount; Output0Buffer.ExposureLimit = Row.ExposureLimit_IsNull ? 0 : Row.ExposureLimit; Output0Buffer.LivingExpenses = Row.LivingExpenses_IsNull ? 0 : Row.LivingExpenses; Output0Buffer.MonthlyIncome = Row.MonthlyIncome_IsNull ? 0 : Row.MonthlyIncome; Output0Buffer.NettAfterInstall = Row.NettAfterInstall_IsNull ? 0 : Row.NettAfterInstall; Output0Buffer.Tax = Row.Tax_IsNull ? 0 : Row.Tax; Output0Buffer.UIF = Row.UIF_IsNull ? 0 : Row.UIF; if (!Row.UpdateDate_IsNull) Output0Buffer.UpdateDate = Row.UpdateDate; Output0Buffer.MonthlyIncomeBandKey = GetBandKey("MonthlyIncomeBand", Row.MonthlyIncome_IsNull? new Nullable<decimal>() : Row.MonthlyIncome); Output0Buffer.AvailableIncomeBandKey = GetBandKey("AvailableIncomeBand", Row.AvailableIncome_IsNull ? new Nullable<decimal>() : Row.AvailableIncome); Output0Buffer.CommitmentsBandKey = GetBandKey("CommitmentBand", Row.Commitments_IsNull ? new Nullable<decimal>() : Row.Commitments); Output0Buffer.LivingExpensesBandKey = GetBandKey("LivingExpenseBand", Row.LivingExpenses_IsNull ? new Nullable<decimal>() : Row.LivingExpenses); Output0Buffer.TaxBandKey = GetBandKey("TaxBand", Row.Tax_IsNull ? new Nullable<decimal>() : Row.Tax); Output0Buffer.UIFBandKey = GetBandKey("UIFBand", Row.UIF_IsNull ? new Nullable<decimal>() : Row.UIF); Output0Buffer.NetAfterInstallBandKey = GetBandKey("NetAfterInstallBand", Row.NettAfterInstall_IsNull ? new Nullable<decimal>() : Row.NettAfterInstall); Output0Buffer.ExpenseMatrixTotalBandKey = GetBandKey("ExpenseMatrixTotalBand", Row.ExpenseMatrixTotal_IsNull ? new Nullable<decimal>() : Row.ExpenseMatrixTotal); Output0Buffer.ExposureAmountBandKey = GetBandKey("ExposureAmountBand", Row.ExposureAmount_IsNull ? new Nullable<decimal>() : Row.ExposureAmount); } I.W Coetzer
February 26th, 2011 6:29am

Excellent - but that's not my code - that's from the gents on the SSIS dev team, Matt Masson in particular (I think). Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
February 26th, 2011 6:29am

mmm found this article: http://www.julian-kuiters.id.au/article.php/ssis-lookup-with-range going to see if it works for my scenario. I.W Coetzer
February 26th, 2011 6:38am

Yip, I looked at yours and their code and altered it slightly, i think the RangeItem is different but that is because i needed to compare a value within bounds. For the conforming part of my ssis package the processing went from 18 minutes to 3 minutes. i think it was exactly 81% quicker :) thx for the direction.I.W Coetzer
Free Windows Admin Tool Kit Click here and download it now
February 26th, 2011 6:45am

This post also helped: http://stackoverflow.com/questions/461439/how-would-i-search-a-range-of-ranged-values-using-cI.W Coetzer
February 26th, 2011 6:48am

You may also want to look at the SSIS team blog on ranged lookups. Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
February 26th, 2011 7:13am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics