Accessing thousands of databases.
I'm in an SQL Server 2008 environment (16 cores 64gb total memory) and thru the network I have access to 30,000+ databases (not Microsoft). There is a particular report that requires data from all of them. If it makes a difference, this query returns between 1 and 50 rows about 150 bytes wide from each. There are plans for doing other types of aggregation and/or reporting and expect to use whatever we come up with for those too if this one works out... Nobody here thinks SSIS has any hope of performing adequately. Is it at all practical to even consider using SSIS for this kind of data loading operation or should we go straight to a custom C# app using .NET 4.0 PFX features.
April 7th, 2011 12:27pm

I had the same scenario that I had to access 50 sources (From 1Mil to 150K records ) and all came to the same stagging table then split up into multiple DB's the sources had different connection type and data format, so i think that getting the right access and see if the source data can be pre-aggriate or you have to do it your self is very important. do use grouping in SSIS in large amount of data, try to have a simple DFT, Avoid using Script component in a DFT, Yes i did use SSIS , in some cases i used Linked server connection, insted of direct connect (some customers connection policy) but any way SSIS will work pefectly check http://henkvandervalk.com/speeding-up-ssis-bulk-inserts-into-sql-server#comments and also check http://henkvandervalk.com/reading-as-fast-as-possible-from-a-table-with-ssis-part-ii good luck Sincerely SH -- MVP, MCITP 2008, MCTS 2008 & 2005 -- Please kindly mark the post(s) that answered your question and/or vote for the post(s).
Free Windows Admin Tool Kit Click here and download it now
April 7th, 2011 1:22pm

If you can use third-party solutions, consider using CozyRoc SSIS+ suite of components. Specifically the toolkit includes Parallel Loop Task, which allows execution of multiple 'For Each Loop' iterations at the same time. On your machine with 16 cores , you might be able to execute 50-100 threads or more at the same time , pulling data from all these databases. You have to make tests and see when you will saturate your network stack ;) I would say this job is certainly possible to be implemented with SSIS.SSIS Tasks Components Scripts Services | http://www.cozyroc.com/
April 7th, 2011 1:37pm

The question may have been misunderstood. With an average of less than 4K bytes being inserted per query/connection I think the performance bottleneck will be in whether SSIS can create enough simultaneous connections to maximize the data retrieval rate. All 32,000+ databases will use what amounts to integrated security
Free Windows Admin Tool Kit Click here and download it now
April 7th, 2011 2:45pm

The question may have been misunderstood. With an average of less than 4K bytes being inserted per query/connection I think the performance bottleneck will be in whether SSIS can create enough simultaneous connections to maximize the data retrieval rate. All 32,000+ databases will use what amounts to integrated security You have to test. If you hit a limitation on the number of connections it will not be because of SSIS and most probably you will experience the same kind of limit with the other frameworks as well.SSIS Tasks Components Scripts Services | http://www.cozyroc.com/
April 7th, 2011 2:59pm

Realistically, you will not be able to build a package with all the 32k+ connections. You will end up using some sort of automation. The way I see it - you will use say a script to run DTExec in which you will substitute in cycle say 100 connection strings 320+ number of times until you processed all your databases. Thus, I think why don't you experiment with the out-of-process way of executing packages? This is when you run a child package from a master package. A child package can be set to execute in its own memory space, just use the ExecuteOutOfProcess property. This way each will execute as thread - in parallel. Otherwise yes, you are going to building a multithreaded console app.Arthur My Blog
Free Windows Admin Tool Kit Click here and download it now
April 7th, 2011 3:05pm

I don't think 3rd party products will be allowed. I might have tried the product anyway since it says "You are free to play with and test the library as long as you need in Business Intelligence Development Studio (BIDS)." but this isn't the type of thing I could test from BIDS since I run it on a laptop. Even if I got a temporary licence I'm pretty sure CorpHQ would not install whatever it is you're selling just so I can play.
April 7th, 2011 4:49pm

I don't think 3rd party products will be allowed. I might have tried the product anyway since it says "You are free to play with and test the library as long as you need in Business Intelligence Development Studio (BIDS)." but this isn't the type of thing I could test from BIDS since I run it on a laptop. Even if I got a temporary licence I'm pretty sure CorpHQ would not install whatever it is you're selling just so I can play. It is worth trying to ask. What is good about the Parallel Loop Task is that if you make your package work properly on 2 processor machine (your laptop I suppose), you can make it work on 256 core machine just by changing single parameter. The package design stays the same. Accomplishing parallelism in SSIS using the regular approach is tough because you have to essentially design your package around your physical machine resources (CPUs). If you want to execute 200 threads in the control flow , you have to insert 200 "Execute SQL Tasks". If you want 500 , you have to insert another 300 and so on.SSIS Tasks Components Scripts Services | http://www.cozyroc.com/
Free Windows Admin Tool Kit Click here and download it now
April 7th, 2011 4:58pm

I have not ignored the comments, I've just been out since last week. As a $2M line of business in a $25+B corp, there is little hope of getting anything custom on our machines even if I could prove it works. I can hear management asking: Explain again how much money we're saving by having these 2 servers set up differently from the other 5000. I do appreciate your suggestion (and ArthurZ's approach suggestions too!) but my real hope was that either someone with experience accessing 100+ DBs would speak from their experience or someone knew of a whitepaper, case-study etc. describing how a similar problem was successfully solved with SSIS. Nobody is going to give me a week to set up a test and compare various configurations when they are all pretty much convinced that either it can't be done with SSIS in any practical way or it will perform so badly it will not be worth the savings over building/maintaining a custom app. I have no idea of performance hopes but the comments so far make me think it's probably not practical to try. I won't start until next week so further comments are welcome but it's looking like I'll be doing some C#.
April 11th, 2011 10:33am

CozyRoc have as customers organizations in this range and even larger. I will leave up to you to guess, which are these organizations ;) What I'm trying to say is rejecting a valid third-party solution just because it doesn't fit your company's computer setup policy is not very smart decision, especially if you can show real benefit using it. Most problems can be solved with custom-programming solution. The question is at what cost and future maintenance hassle.SSIS Tasks Components Scripts Services | http://www.cozyroc.com/
Free Windows Admin Tool Kit Click here and download it now
April 12th, 2011 9:34am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics