Sanity Check- trying to make sure we are correct and optimal (Network Steve Forum)

Sanity Check- trying to make sure we are correct and optimal

We have a 2-node Hyper-V setup running on Server 2012. Servers are identical IBM x-series and the SAN is a v3700, running with 2 nodes. Each Hyper-V host has two FC HBA's, and they are wired so that FC HBA 1 on server 1 goes to the first FC port on the node 1 HIC, FC HBA 2 on host 1 goes to the first FC port on the Node 2 HIC. There are no switches in between. The second server is the same setup, using the second ports on the FC HIC's in the SAN. all the link lights look good on the FC connections, and everything has been running smoothly for almost a week. This setup has been almost maintenance-free and in production for 18-24 months, until a few weeks ago. For some reason, the hosts began losing their connection to the SAN, which causes VM's to freeze, and general pandemonium in the office. The problem was seemingly resolved by replacing parts in the SAN, and the storage cabling is currently connected as I previously mentioned, however the process of mitigating the sudden failure of our virtual infrastructure (Happened about 6 ties in three weeks) has left me with questions about best practices with this setup- I'm interested in any advice, especially when it comes to Hyper-V settings and best practices to deal with this kind of situation. We are setup in what I consider a basic failover cluster- there is a cluster and a quorum disk, and the hosts deal with the storage and the VM's are running from VHD files. Settings are managed from the Failover Clustering manager for the most part, and the VM functionality is managed from the Hyper-V manager. We do not use any software outside of the tools that ship with Windows Server 2012, except that we use BackupExec 15 for backups. We started having our issues when we started using the software to try to use its addins to back up the guest machines- the problem is that we don't know if the problem was caused by the backups or by the hardware, which mysteriously failed at the same time we tried to implement the BE solution. I did notice that NUMA is enabled in our Hyper-V system, but the guests are generally low-throughput systems- we have a printserver, a fileserver, a couple of web application servers (running things like WSUS, a CAS server, SharePoint for a 5-person workgroup, and a couple of terminal servers- roughly 10 guests split by memory use across two hosts). I don't want to make changes unless there is a benefit, either reliability or efficiency, but I just want to make sure that relevant settings are crrect for this kind of setup. Any suggestions?

Thanks!

August 25th, 2015 2:16pm

As far as the Backup Exec portion of your set up, check out the following: http://www.slideshare.net/symantec/white-paper-46545324 http://www.symantec.com/docs/HOWTO74442

hope this helps

Free Windows Admin Tool Kit Click here and download it now

August 25th, 2015 3:16pm

Thanks so much- We are looking at this now. I realize the post could have gone many different directions. I appreciate the resource!

August 25th, 2015 3:27pm

In a general FailOverClustering scenario, your ShareStorage is your Single Point of Failure. I have seen this a few times lately. For a quick Disaster Recovery option. I would recommend a third (identical) IBM Server but do not make it a member of the cluster. Rather, install it as a Hyper-V Replica server with it's own (separate) storage. You can replica all your VM's to the replica partner and in case of a cluster or storage failure, you can perform a failover and have your VM's running again in no time. When the cluster is running again you can reverse the replica and after, reverse the failover to get them running on the original source.

The question is: Is in your case the investment (and the added complexity) worth the reduced downtime?

Cheers,

Remco Vermeer

Edited by Remco Vermeer 31 minutes ago

Free Windows Admin Tool Kit Click here and download it now

August 26th, 2015 2:36am

This topic is archived. No further replies will be accepted.