SQL Azure (aka Windows Azure SQL Database) has a series of measures to protect user data. Essentially, we have built a series of mechanisms to give the same benefits as ECC without requiring ECC. (More broadly, we check all sorts of things, such
as network packet checksums, to detect/correct for bugs in hardware or drivers). For data writes, we have a mechanism that commits to a quorum of machines to guarantee that we harden changes on disk (and have a series of measures to validate that
data is correct when written to detect/correct against bit errors on disk). Also, we have measures to detect/correct errors for bit errors in memory (page checksums, etc.). We also have a system to help us detect and track hardware failures, to
burn-in hardware with tests before user workloads are allowed on the machines (both initially and before a machine is returned from a repair action), and we also have a system where we do statistical analysis for kinds of failures that might predict that a
bit error _might_ happen in the future (so we can proactively move the machine offline).
Ultimately, we are confident in the model being correct (and actually we have higher confidence in this system than a lot of the hardware we see people run on-premises because we spend so much time validating hardware formally) such that you can be confident
in the correctness of your data in SQL Azure.
I hope this gives more clarity to your question/concern.
Thanks
Conor
-
Proposed as answer by
Guy Haycock [MSFT]Microsoft employee, Owner
Friday, June 27, 2014 4:47 PM