Exchange 2010 and Witness server (Network Steve Forum)

Exchange 2010 and Witness server

Guys, I'm writing my Exchange 2010 Design over the next week or so, I have everything clear in my mind reasonably well, apart from the File share Witness aspect. Does anyone, who has implemented 2010 with a 2 Datacentre model, have experience of locating the Witness in a 3rd site. effectively, I will have an even number of servers in Site A (8 servers) and Site B (Both Datacentre are active), and I want my File Share Witness to be in Site C. Site C is literally no more than a machine room that I can host 1 server in. I have as yet found no technet resources that explicitly elaborates on this configuration, though I know it must work and be supported. If Site A fails, then quorum is maintained as long as Site B and Site C are still in contact, and Vice versa.

April 6th, 2011 10:21am

Are you sure that the connection between Sites A and C and between B and C aren't going through the same cloud? Because if they are, and that cloud goes down, Exchange goes down. If you configure DAC so that servers don't fail over between sites automatically, you can configure an alternate witness server which gets activated when you activate the standby datacenter. That's what I'd recommend you pursue and drop the idea of a Site C. While Exchange 2010 and the DAC are vastly superior to CCR and SCR, I still have reservations about configuring site resilient configurations that automatically fail over because the failover mechanism is still based on Windows clustering, and that is really designed for servers being on the same LAN segment.Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."

Free Windows Admin Tool Kit Click here and download it now

April 6th, 2011 10:56am

Hi Ed Despite the fact we have a second Datacentre 11km away, we have a 10GB connection between them which offers 850m/sec, and there is no contention, it is purely for the Exchange environment. So, effectively, you can think of us as 1 LAN segment. We are also just 1 subnet, and 1 domain.very simple. Automatic Server failover is what I have right noiw across the Datacentres. 2 nodes SCC's, 1 in Site A and 1 in site B. Automatic failover takes less than 2 minutes. My current Site C hosts a File share Witness for my SAN Storage solution, same quorum principle applies. 2 nodes in Site A have votes, 2 nodes in Site B have 2 votes (Raid 10 replication on this Storage solution). Site A fails, Site B and site keep their quorum and service remains. Tom

April 6th, 2011 11:21am

If that's the case why don't you just configure it as one site?Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."

Free Windows Admin Tool Kit Click here and download it now

April 6th, 2011 11:45am

Hi, The long and short of it is yes you can use a third site for your FSW. You may need to check your networking topology however as Ed states if the link between the datacentres holding the DAGs fails - if that affects the location with the FSW and you can't achieve quorum (even nodes for example which you have - 8) then the DAG can't automatically come online. Personally I would 1. Site the FSW with the 8 DAG servers that will be in the primary site 2. configure an alternate FSW when creating the DAG (SP1 feature) and site that for a server in the secondary dc. 3. In the event of WAN link failure and you having to fail over just use the Restore-DatabaseAvailabilityGroup cmdlet for DR purposes. This should activate the alternate FSW. Check out "Restore-DatabaseAvailabilityGroup: http://technet.microsoft.com/en-us/library/dd351169.aspx" OliverOliver Moazzezi | Exchange MVP, MCSA:M, MCITP:Exchange 2010, BA (Hons) Anim | http://www.exchange2010.com | http://www.cobweb.com | http://twitter.com/OliverMoazzezi

April 6th, 2011 11:53am

Free Windows Admin Tool Kit Click here and download it now

April 6th, 2011 12:09pm

Sorry Ed, not sure I follow. If I have 8 servers in site A, and 8 servers in Site B, and I house the FSW in , for instance, Site A....then, if Site A goes down....Site B wont have majority and therefore no quorum and so will not stay up? Is that not correct? I appreciate that Site B can have the alternate FSW, but that still involves manual intervention. The whole design of 2007 here was to, as much as possible, make failover automatic. If I use the alternate FSW option then it's manual intervention. Site C is actually completely seperate, not part of the same subnet or physical network topology. It does make it slightly more complicated with Site C in that I guess I'd need it as part of the domain so might need some vpn'ing. Hi Oliver. The concept of Primary and secondary sites confuses me a bit in that both of my Datacentres are actively hosting mailboxes and provide automatic failover for those mailboxes they are not currently hosting. The idea behind hosting a FSW on 1 or other of the active Datacentres is completely opposed to the quorum model I am used to...i.e. FSW is a referee so is on "neutral" ground. The FSW provides that odd number to achieve Quorum. If you want to achieve automatic failover then put an odd number in each site, and then the FSW is a moot point, minus the DAC consideration below. In regards to two sites - you should be running the DAG in DAC mode, which will prevent automative failover in the event of loss of the WAN link. This is to stop an active active database copy scenario in the event of split brain syndrome. That means both datacentres can't see each other anymore, so both sides of the DAG mount all databases. OliverOliver Moazzezi | Exchange MVP, MCSA:M, MCITP:Exchange 2010, BA (Hons) Anim | http://www.exchange2010.com | http://www.cobweb.com | http://twitter.com/OliverMoazzezi

April 7th, 2011 5:22am

Free Windows Admin Tool Kit Click here and download it now

April 7th, 2011 7:47am

Thanks for the insight folks. I still wonder if people appreciate that we are not after a "DR" failover configuration, we really ARE able to run our environment with the same performance for users irrespective of whether you are in Site A or site B, the 10GB Dark Fibre allows this. So, despite being 2 physically different locations, we are 1 Logical site. We currently have a solution where storage quorum is provided by a Failover manager hosted at Site C. This just logically seems like exactly the same conf we would want with Exchange 2010 (and theoretically it seems it's definitely possible, though the feeling I am getting is that it isn't advised?!?). I struggle to understand why it would be advisable, with all of the above in mind, to keep the FSW in either of the 2 Datacentres instead of on a third, physically seperate site. If I host the FSW in Site A, then if Site A goes down I have to manually intervene to introduce the alternate FSW in Site B. This manual intervention is not something we feel we should have to go back to. If the FSW is in Site C, and the connection link to it is sound, then if the WAN link between Sites A and B goes down, Site C FSW is able to tell both Sites A and B to stay up (No manual intervention needed)....OK, if Site C then goes down we have a problem, but that is exactly the "risk" we have with our current quorum model, and the risk is acceptably low which the Business historically agreed to. If Exchange 2010's quorum mechanism isn't able to provide the same kind of robustness as the LeftHand networks Site resilient quorum model, then yes we have some things to consider. I'd really appreciate the insight you folks have as to why hosting the FSW in a Primary Datacentre is superior to hosting it on a 3rd site... Totally understand you with the fibre connection and running two physical locations as one, and get you on hosting the FSW in a third site. As I said earlier, if the connection to Site C is different rather than shared for the connections to Site A and B (meaning if you lose access between Sites A and B they can still see site C then it is derisked, host the FSW in Site C (in the event of massive disaster you can always still specify an alternate FSW manually). If Site A can still see Site B via routing through Site C then you have even less of a problem. Ultimately an understanding of how your users route would also be beneficial, as if they get cut off with link failures then the who cares if the DAG is still up as you've lost service anyhow. So, to sum up, if all your sites can sustain link failures and route elsewhere (AD Site links for example), then you don't really have an issue. I hope that clears it up somewhat. OliverOliver Moazzezi | Exchange MVP, MCSA:M, MCITP:Exchange 2010, BA (Hons) Anim | http://www.exchange2010.com | http://www.cobweb.com | http://twitter.com/OliverMoazzezi

April 7th, 2011 8:53am

Cheers Oliver, that sounds more like the understanding I had. I was second guessing purely because the "vibe" I got was to do it another way, but then I didn't present all the facts initially ;-) Effectively, Site C is a central point that boths Site A and B can currently see. The link from Site A to C and from Site B to C isn't nearly as impressive, so no, standard traffic routing doesn't follow that path. It's purely there to provide the "refereeing" duties to our LHN Failover Manager, which for Exchange 2010 would be the FSW. As far as user routing is concerned, we have switches that sit in front of our ISA array; these switches can route to multiple gateways (in our case, 2). So, user traffic coming in gets routed in a load balanced manner to the ISA servers at either physical location. Ultimately, if the WAN goes down, traffic is still directed to both Datacentres (obviously there are other issues that WAN unavailability creates, but the point is we still have traffic directed to both Datacentres because both Datacentres are still able to maintain Quorum due to the third site). If physical location A or Physical location B experiences an outage, then the ISA's at the affected location become unavailable and the switches in front of them stop trying to deliver to the affected "gateway" location.

Free Windows Admin Tool Kit Click here and download it now

April 7th, 2011 9:42am

Perhaps I also forgot to mention 1 major aspect of my service which might shed light on how I describe things..., there is ZERO Direct Mapi connection. Everything comes in from the outside world, through ISA and into the Email Service. We effectively behave like an ISP in a fashion. Outlook Anywhere, external IMAP, external POP, activesync etc ;-)

April 7th, 2011 9:49am

So all your users are on the other side of a WAN - that's how we do things here too (Exchange Hoster). I guess having so many copies at each location you don't have to worry so much about understanding if you run into an Active/Active scenario (no split network, but databases are active on either side). If you need to understand that also then we can talk about it here. Let me know if your AD topology though (ie AD Sites). Anyhow I hope you are now good with the whole FSW. OliverOliver Moazzezi | Exchange MVP, MCSA:M, MCITP:Exchange 2010, BA (Hons) Anim | http://www.exchange2010.com | http://www.cobweb.com | http://twitter.com/OliverMoazzezi

Free Windows Admin Tool Kit Click here and download it now

April 7th, 2011 10:27am

Hey Oliver, yeah, it can be painful from time to time with all connections originating from the ether out there :-). I am hoping that the FSW is the "intelligence" that prevents a split brain type scenario in a 3 physical site model i.e. where databases are active on both sides of the dark Fibre. i.e. Fibre goes down, but fsw is still visible to both Datacentres via Site C so there DAG members remain in the same state....i.e no failovers happen. It is possible I am confusing "maintaining quorum" and "server failover" however..... If the WAN dissapears, yet all servers remain up because Quorum is still maintained, then I am reckoning that no failovers get initiated by the PAM\Active manager. I'm still thinking this through in my mind though ;-) I have 1 Active Directory site, that's the lot :-)....stretched Subnet across both physical locations and stretched Domain, so no need to create more than 1 AD site. Nice and simple, and all possible because of the 10GB connection and the intelligent switches.

April 7th, 2011 12:10pm

Hi, To clarify: If you are active/passive and the active side can still see the FSW, then the loss of the wan link between Site A and B will not initiate a failover. Seeing you have stretched an AD Site don't forget in the event of a loss of a Data Centre to update you DNS records for all your WAN users so they get repointed to the other location. Make sure you put low TTL records on all the DNS records (autodiscover, pop, imap, owa, outlook anywhere etc). I hope that helps OliverOliver Moazzezi | Exchange MVP, MCSA:M, MCITP:Exchange 2010, BA (Hons) Anim | http://www.exchange2010.com | http://www.cobweb.com | http://twitter.com/OliverMoazzezi

Free Windows Admin Tool Kit Click here and download it now

April 8th, 2011 6:36am

Thanks Oliver Just fyi, perhaps I have been poor in explaining how we work. For example: Assume I have 4 ESX Server London, and 4 ESX Servers in Surrey On those ESX Servers I have 7 CAS and 3 Hub transport in each physical location. I also have 6 Mailbox Servers in London, and another 6 in Surrey. These 12 Servers are configured in 6 * 2 node SCC Clusters. So, London Mailbox 1A is paired with Surrey MAilbox 1B. If Londons mailbox server has a fault, instant failover happens to Surrey in a matter of minutes. The stretched subnet and Domain and DNS etc does not need any intervening. The switches that sit in front of our ISA Array (3 ISA in London, 3 ISA in Surrey) are able to send traffic, which is destined for our service, to multiple IP addresses. So, every packet is sent twice, 1 to our 3 node ISA NLB virtual IP in London and 1 to our 3 node ISA nlb virtual IP in Surrey. The ISA Array (Which spans all 6 ISA nodes) then decides what it is going to do with the packet and drops 5/6 of them and transmits only 1 (Through whichever ISA Server it chose). Effectively, everything is designed to be relatively seamless failover, no external dns hostname changes etc.

April 8th, 2011 9:51am

OH, of the 6 SCC clusters, 3 are active at any time in London, and the other 3 are active in Surrey...so our Datacentres are truly running Active\Active. A Site failover therefore only affects half the popiulation, in so much as they will have a few minutes outage during cluster failover.

Free Windows Admin Tool Kit Click here and download it now

April 8th, 2011 9:53am

I guess you mean CCR rather than SCC, so each site has local copies of the data. Can I ask what your benefits for having half active in each site are? I can partially understand this with Exchange 2007 as only half your users get failed over in the event of loss of DC, so it's less of a user surge issue when you are in a cold cache state - but Exchange 2010 fixes this (logs are commited by store.exe). I think the only reason you can stretch your site (stretch layer 2) and have one AD Site is your 10GB DF connection. I think that's the wildcard here. stretchin a 2 node cluster across the WAN I wouldn't do however DF connection or not - as if you ever did lose a DC you are immediately down to 1 node, and there are quite a lot of reasons that can stop the single node bringing all databases online (middle of a reseed for example). Anyhow I read your previous posts as having HA in each site so you have HA and Site Resiliency, rather than blurring them all into one which you appear to do today with 2007.Oliver Moazzezi | Exchange MVP, MCSA:M, MCITP:Exchange 2010, BA (Hons) Anim | http://www.exchange2010.com | http://www.cobweb.com | http://twitter.com/OliverMoazzezi

April 8th, 2011 10:12am

Sorry Oliver, Most of what I have been telling you is historical, i.e. we ARE 2007 right now and will be moving to 2010 soon ;-) The purpose of the background has been to demonstrate the resilience we currently have, and how we want to at least maintain that sort of resilience with 2010.

Free Windows Admin Tool Kit Click here and download it now

April 8th, 2011 10:50am

This topic is archived. No further replies will be accepted.