Our servers are using at least a NIC-Team of 3 (plus an independent Management-Adapter) to balance the load and ensure availability.
Regular client access is perfectly fine and works without interruption, but when we transfer data between the servers (backups for example) we notice a frequent "disconnect" of 1-5 seconds.
First, all servers have been running in LCAP Mode with dynamic load-balancing. After testing a lot of configuration settings without success, we ended up having static teams with Address Hash-Based load balancing. (The idea was that the disconnect might be caused by the dynamic distribution)
However still: Every 50 seconds or so, the nic used to transfer data is dropping to a speed of "0 Bytes/second" for 1-5 seconds. Also other Clients accessing the server in question through the pooled connection then suffers from a short disconnect. (Remote desktop client, etc...)
Neither of the Servers shows an error log, nor does the NIC Teaming interface report any problems - just the "Bytes/Second" drops to 0: (In this example, nic 3 is used due to address hash)
Connection drop:
the switch used is a HP ProCurve 1810G - 24 GE, with proper setup of the static teaming on the respective ports.
the Switch reports whenever a link is up or down - and whenever the issue appears, such an event is logged:
1 | Info | Sep 11 13:23:07 | NIM | Interface 6 is Link Down |
2 | Info | Sep 11 13:23:10 | NIM | Interface 6 is Link Up |
If the teaming as removed and the server are using a single connection, no problems appear, connection remains stable.
All nics have been configured with a static duplex mode to avoid effects caused by auto-negotiation.
Any idea where to start with debugging this?
Maybe I should add, that the teaming is configured on the hyperv-hosts, while the servers transfering data are vms that are hosted on seperate hyperv hosts. (the Teamed connection is exclusively assigned to the hyperv-switch used by all VMs on that certain host)- Edited by dognose Friday, September 11, 2015 1:43 PM