Publish SQL Server data to clients from saas website with multi-tenant database?
We maintain a Software as a Service (SaaS) web application that sits on top of a multi-tenant SQL Server database. There are about 200 tables in the system, this biggest with just over 100 columns in it, at last look the database was about 10 gigabytes in size. We have about 25 client companies using the application every entering their data and running reports. The single instance architecture is working very effectively for us - we're able to design and develop new features that are released to all clients every month. Each client experience can be configured through the use of feature-toggles, data dictionary customization, CSS skinning etc. Our typical client is a corporate with several branches, one head office and sometimes their own inhouse IT software development teams. The problem we're facing now is that a few of the clients are undertaking their own internal projects to develop reporting, data warehousing and dashboards based on the data presently stored in our multi-tenant database. We see it as likely that the number and sophistication of these projects will increase over time and we want to cater for it effectively. At present, we have a "lite" solution whereby we expose a secured XML webservice that clients can call to get a full download of their records from a table. They specify the table, and we map that to a purpose-built stored proc that returns a fixed number of columns. Currently clients are pulling about 20 tables overnight into a local SQL database that they manage. Some clients have tens of thousands of records in a few of these tables. This "lite" approach has several drawbacks: 1) Each client needs to develop and maintain their own data-pull mechanism, deal with all the logging, error handling etc. 2) Our database schema is constantly expanding and changing. The stored procs they are calling have a fixed number of columns, but occasionally when we expand an existing column (e.g. turn a varchar(50) into a varchar(100)) their pull will fail because it suddenly exceeds the column size in their local database. 3) We are starting to amass hundreds of different stored procs built for each client and their specific download expectations, which is a management hassle. 4) We are struggling to keep up with client requests for more data. We provide a "shell" schema (i.e. a copy of our database with no data in it) and ask them to select the tables they need to pull. They invariably say "all of them" which compounds the changing schema problem and is a heavy drain on our resources. Sorry for the long winded question, but what I'm looking for is an approach to this problem that other teams have had success with. We want to securely expose all their data to them in a way they can most easily use it, but without getting caught in a constant process of negotiating data exchanges and cleaning up after schema changes. What's worked for you? Thanks, Michael
November 15th, 2012 1:56am
In this case I would create an OLAP solution and give outside developers access to OLAP database. It's a good way to feed business requirements in reports in different dimensions and different values. As you will be responsible for OLAP database update you can test whether structure changes in main database cause errors in OLAP database. You can create calculate columns in OLAP solution to be flexible with reports' requirements. The target scheme for this solution can be found at the beginning of the post: http://borisfrolovsqlservertips.wordpress.com/2012/10/21/change-data-capture-practice-experience-part-1/
November 15th, 2012 7:35am
Thanks Piotr. I was involved in a project that made extensive use of SQL merge replication several years ago (SQL 2000). It was extremely cumbersome at the time, we practically had a full-time job just dealing with replication conflicts. I'm sure it's moved on since then, and we'd be using it in a different manner this time, but after my earlier experiences I'm reluctant to use replication again. How well does it cope in an internet-facing web environment, e.g. supporting HTTP and our own custom authentication mechanism? Will it require rowguids on every table? We're using integer primary keys. Cheers, Michael
November 21st, 2012 4:12am
Hi Boris, Thanks for the OLAP suggestion I think I'll certainly put that on the table for discussion, you make a good point that we can deal with schema changes ourselves as the data goes from the SQL relational DB into the cube. Change data capture may also be of use to us because I'm not sure whether it's safe (capacity-wise) for us to move forward with a solution that involves sending ALL of our clients' data to them every night, rather than just what has changed. How were you envisaging exposing the cube given we have upwards of 20 clients who may want to access their own data, but we absolutely must prevent leakage of data between clients. We do currently use a cube for some reporting, and expose it to (one or two) power-users via the HTTP data pump. Unfortunately we've found that role-based dimension data security requires Windows auth to work, and we're web-based with our own custom authentication, so we can't really expose this any wider. Were you thinking to build a separate cube per client and then spit each one out for them every night for their own reporting usage? We're already quite heavily invested in in SQL Server Reporting Services Report Models, the support for custom authentication is excellent - but to our frustration we've heard that these are being deprecated. I suppose a possible roadmap for us may include the "BISM" for SQL 2012 - but our current desire to stay away from SharePoint seems to rule this out for us at the moment. Cheers, Michael
November 21st, 2012 4:35am
In your case you will have single publisher and multiple READ-ONLY subscribers scenario, there will be no peer to peer replication. Usually nightmare scenarios are in the peer to peer or merge replication, when subscribers may update or insert data (so you need global unique keys, etc.) and the application wasn't designed to support such. You will have separate database (or even instance) for each customer and you can build security around it.
November 21st, 2012 5:30am
It depends... Both scenarios are doable, I would consider playing with both options in the lab.
November 22nd, 2012 8:53am
I know you've answered your question here, but my company is tackling the exact same problem and I thought I could contribute. Two additional avenues I'm considering are: Have the customer pay a monthly fee that would cover a SQL Azure account, and we would sync data to Azure and give them access. This gets the customers off our hardware for queries against their data. The downside is that every customer has to be comfortable with SQL Azure, and there's added expense and complexity of managing a separate Azure account for each customer wishing to connect.I came across a company called Connection Cloud which I'm investigating. They provide a SaaS data liberation service, whereby you write a connector and your customers subscribe to Connection Cloud in order to query your data. In a way, it's like a proxy to your database. The customers can use ODBC or JDBC to connect and query. They can connect from Excel, Tableau, or any other tool. Looks promising but it's early days in my investigation. It costs the customer something like $99/month to use the Connection Cloud.
February 24th, 2013 6:32am