New SSIS Project to Replace Hand Written SQL system

Hi All,

I am in the process of creating a rough project estimate, to design, develop, test and implement a new SSIS project. The purpose of this project is to completely replace the existing ETL system which is only made up of SQL replications, SQL Jobs and hand written SQL Views.

  • To provide an idea of the present environment, we have 3 source databases which reside in the same physical server. There are approximately 100 tables in each of these databases. 
  • Data from these databases is copied to a staging area using snapshot replications (yes, they run every day)
  • Data is held in this staging area. The target database has several views that read the data from the staging. The views use UNION ALL to combine data from all three staging DBs as their structures are similar.
  • Apart from these views, there are a handful of SQL jobs which which involve complex transformations, and are then loaded to the target tables. 

This is just a high level over view. This system isn't robust enough and has several other issues. Few of them being: It is not easily extensible, not easy to Maintain, performance is poor, etc.

Due to these reasons, we are considering replacing this with a proper SSIS system and we are doing an estimate for the same. We would like to keep the changes to the target system as minimum as possible.

I would like to know if these Replication Jobs need to be replaced using an SSIS component? or would they need to remain? In which case, we would just do the estimation to replace the SQL jobs and Views.

We do not have an expert in this field, therefore I would like to do a meaningful estimation with some help. Any suggestions or pointers would be very helpful. Of course this would only be a ball park figure which we will improvise on, as we go. There are other factors which I will also consider for the estimate such as resource availability, skill set of the resource, number of SQL jobs to replace & their complexity, number of views to replace & their complexity.

Currently, since we do not know the number of components we would have to build, we are going based on the number of components we have in the existing system and their complexity. Any suggestions on how to go about this would be appreciated.

Thanks

Dhivya


  • Edited by Dhivya Elan 20 hours 16 minutes ago Correction
August 28th, 2015 6:35am

Hi Dhivya,

It would be good if you could post what edition and version the database engine is. And what SSIS version you are planning using, nevertheless...

It looks to me having CDC implemented can be the proper fit and takes perhaps a day or two to POC and even perhaps as long to go live with.

Free Windows Admin Tool Kit Click here and download it now
August 28th, 2015 9:46am

Hi Arthur

Very informative link to CDC and definitely something for us to consider. Thank You!

The current system is built on SQL Server 2008 R2. We are considering upgrading to & SQL Server 2014 Integration Services.

If we try and replace the snapshot replications with a solution that does incremental loads to the staging, then would it be meaningful to implement a robust SSIS system to transfer data just from staging to target?

Thanks

Dhivya


August 28th, 2015 10:06am

>We do not have an expert in this field, therefore I would like to do a meaningful estimation with some help.

You will not be able to do a meaningful estimation without an expert in this field, who furthermore will need to do a detailed analysis of the existing data flows, and probably prototype the solutions.

Also it's not at all obvious that replacing replication + TSQL with SSIS is a viable or optimal strategy to make the system more "robust", "extensible", "performant", or "maintainable".  In particular using SSIS instead of TSQL for data flows between databases on the same server is likely to be slower, not faster. 

Only if the existing TSQL transformation code is poorly-written you may be able to do better in SSIS.

Free Windows Admin Tool Kit Click here and download it now
August 28th, 2015 10:24am

I would like to know if these Replication Jobs need to be replaced using an SSIS component? or would they need to remain? In which case, we would just do the estimation to replace the SQL jobs and Views.

Dhivya,

If your ETL process needs the complete database at its disposal then probably keeping the replication would be a good idea. If the ETL just needs new data then you can explore setting up incremental loads instead of replication.

Don't fall into the trap of creating an SSIS that only has Execute SQL Tasks. Because then the SSIS would just be a mother stored procedure which calls other child procedures or batch scripts! It will be better to create an actual "mother stored procedure" :-)

It seems that in your case, SSIS would mostly be used to replace the Views and the transformations in the Agent Jobs. As there is no available expert (whatever that means :-) ), one step could be to familiarize yourself with the Control Flow Tasks and Data Flow Transformations available in SSIS. Don't bother about how they work right now, just find about what can they do. Then you can probably make an educated guess if SSIS is the right tool to replace your transformations.

You can read the first few paragraphs about each Task and Transformation (expand the left side menu) on the following MSDN page -

https://msdn.microsoft.com/en-us/library/ms137681(v=sql.110).aspx

As for figuring out the complexity of your current setup and effort estimation for SSIS, you can refer to few of my blog posts -

https://aalamrangi.wordpress.com/tag/estimation/

All the best!

August 28th, 2015 3:16pm

...

If we try and replace the snapshot replications with a solution that does incremental loads to the staging, then would it be meaningful to implement a robust SSIS system to transfer data just from staging to target?


It depends on how often it needs to run plus how much data moves.

With CDC you can do incremental loads, too. I am not a big fan of Replication because it is hard to manage and monitor, CDC has its own weak points, but is better suited for SSIS.

Adding a link on SSIS + CDC to do incremental loads https://technet.microsoft.com/en-us/library/Bb895315%28v=SQL.105%29.aspx?f=255&MSPPError=-2147

Free Windows Admin Tool Kit Click here and download it now
August 28th, 2015 3:51pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics