We have a 2-node Hyper-V setup running on Server 2012. Servers are identical IBM x-series and the SAN is a v3700, running with 2 nodes. Each Hyper-V host has two FC HBA's, and they are wired so that FC HBA 1 on server 1 goes to the first FC port on the node 1 HIC, FC HBA 2 on host 1 goes to the first FC port on the Node 2 HIC. There are no switches in between. The second server is the same setup, using the second ports on the FC HIC's in the SAN. all the link lights look good on the FC connections, and everything has been running smoothly for almost a week. This setup has been almost maintenance-free and in production for 18-24 months, until a few weeks ago. For some reason, the hosts began losing their connection to the SAN, which causes VM's to freeze, and general pandemonium in the office. The problem was seemingly resolved by replacing parts in the SAN, and the storage cabling is currently connected as I previously mentioned, however the process of mitigating the sudden failure of our virtual infrastructure (Happened about 6 ties in three weeks) has left me with questions about best practices with this setup- I'm interested in any advice, especially when it comes to Hyper-V settings and best practices to deal with this kind of situation. We are setup in what I consider a basic failover cluster- there is a cluster and a quorum disk, and the hosts deal with the storage and the VM's are running from VHD files. Settings are managed from the Failover Clustering manager for the most part, and the VM functionality is managed from the Hyper-V manager. We do not use any software outside of the tools that ship with Windows Server 2012, except that we use BackupExec 15 for backups. We started having our issues when we started using the software to try to use its addins to back up the guest machines- the problem is that we don't know if the problem was caused by the backups or by the hardware, which mysteriously failed at the same time we tried to implement the BE solution. I did notice that NUMA is enabled in our Hyper-V system, but the guests are generally low-throughput systems- we have a printserver, a fileserver, a couple of web application servers (running things like WSUS, a CAS server, SharePoint for a 5-person workgroup, and a couple of terminal servers- roughly 10 guests split by memory use across two hosts). I don't want to make changes unless there is a benefit, either reliability or efficiency, but I just want to make sure that relevant settings are crrect for this kind of setup. Any suggestions?
Thanks!