SSIS - Term Extraction - Per Row
I'm need to relate the extracted words from the "Term Extraction" task based on the [description] column for each given row in my data source. The task is currently giving me a global result of extracted [description] columns across all rows so there is no way for me to relate any of the words to a specific row. I want to end up with a second table that contains all the extracted words grouped with the same foreign key back to the source record from which those specific words were derived. If i search on these words which will have a pointer(key) back to the source row and display that in my app Any ideas how i go about achieving this result with SSIS Term Extraction Task? Thanks
June 13th, 2011 1:32pm

Perhaps you need a Fuzzy Lookup with Merge Join as shown here: http://consultingblogs.emc.com/jamiethomson/archive/2005/03/30/SSIS_3A00_-Adventures-with-Fuzzy-Matching.aspx?Arthur My Blog
Free Windows Admin Tool Kit Click here and download it now
June 13th, 2011 1:41pm

Further to Arthurs comments - Term Extraction is a process that operates across a whole set of rows - see http://www.bimonkey.com/tag/term-extraction/ - as frequencies of terms appearing will generally be meaningless in a single row. So as suggested, you need to run the Term Extraction across your dataset to build a reference table. Then you need to pass your data through a lookup of some description to use the reference data collected.James Beresford @ www.bimonkey.com SSIS / MSBI Consultant in Sydney, Australia SSIS ETL Execution Control and Management Framework @ SSIS ETL Framework on Codeplex
June 13th, 2011 7:21pm

Further to Arthurs comments - Term Extraction is a process that operates across a whole set of rows - see http://www.bimonkey.com/tag/term-extraction/ - as frequencies of terms appearing will generally be meaningless in a single row. So as suggested, you need to run the Term Extraction across your dataset to build a reference table. Then you need to pass your data through a lookup of some description to use the reference data collected. James Beresford @ www.bimonkey.com SSIS / MSBI Consultant in Sydney, Australia SSIS ETL Execution Control and Management Framework @ SSIS ETL Framework on Codeplex I've previously stumbled across your blog post on Term Extraction during my research. My interest in Term Extraction lies strictly in the extracted nouns that can be derived from the description field in each rows in my dataset and NOT the frequency calculation. From previous experience using pure SQL, doing lookups to map extracted words back to the originating description row, takes a fair amount of time to execute on my 500,000+ row dataset. Before i go down that path i will see if there is any speed to be gained by passing 1 row at a time to the Term Extraction Task, This will allow me to extract the nouns and maintain the link to the source row. I will pass my dataset to a ForEach container and try and find a way to pass each iteration of the dataitem variables to the data flow task that contains the Term Extractor. Alas, not every situation can be modeled in a straightforward fashion using SSIS Lego blocks..
Free Windows Admin Tool Kit Click here and download it now
June 14th, 2011 11:43am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics