Windows Server 2012 LBFO performance
Hi Guys, I've been testing on numerous systems, with a variaty of NICs (1GB and 10GB) and different switches (HP and Cisco). I can't seem to get the expected results. I have also noticed the uneven distribution across the team member as described in this post, but the explenation by Silviu sounds logical. I still have a lot of questions concering the bandwith on an LBFO Team. First i'll describe my setup. Two brand new HP DL360P G8 installed with Windows Server 2012 Datacenter. On each server the OS was installed on a RAID1 of 600GB 6G 15K SAS disks. Onboard a DUAL 10GB NIC with RSS support and a QUAD port 1GB NIC also with RSS support. All the NICs are connected to a HP Procurve 5406zl. NTttcp tool installed on both servers and disabled the firewall. No other software installed. Before looking at the performance in the Hyper-V switch on top of an LBFO team i wanted to test the performance step by step. I'll describe the steps below. After each step i tested the performance with: A large (5GB) file copy from Server1 to Server2NTttcps -m 1,0,192.168.1.1 -a 2 on Server1 and NTttcpr -m 1,0,192.168.1.2 -a 16 -fr on Server2 (for the 10GB NIC tests)NTttcps -m 1,0,192.168.1.1 -a 2 on Server1 and NTttcpr -m 1,0,192.168.1.2 -a 6 -fr on Server2 (for the 1GB NIC tests) The steps and the (unexpected) results. : Configure one 10GB NIC on Server1 and configure one 10GB NIC on Server2 (without LBFO and without Switch configuration). The file copy speed is as expected around 850MB/s. So far so good. Then i run the NTttcp test and the result is about 500MB/s. Performing the NTttcp test ten minutes later the speed even dropped to 240MB/s. The file copy still performed around 850Mb/s. So my first thought was, maybe i'm not using the NTttcp tool correctly.Configure an LBFO team of two 10GB NICs (LACP / Address Hash) on Server1, configure an LBFO team of two 10GB NICs (LACP / Address Hash) on Server2 and configure the Switch with 2x 2Ports in LACP Trunk. The file copy speed is now at 500MB/s. This is confusing. I would at least have expected to have a minimum of 850MB/s. The test with NTttcp even showed a speed of 160MB/s.Deleted the previous team. Disabled the 10GB NICs and created an LBFO team of four 1GB NICs (LACP / Address Hash) on Server1, created an LBFO team of four 1GB NICs (LACP / Address Hash) on Server2 and configured the Switch with 2x 4Ports in LACP Trunk. When i initiate the file copy from Server1, copying from Server2 to Server1, the speed is 113 MB/s (consistent with only one adapter). When i initiate the file copy from Server 2, copying from Server2 to Server1, the speed is around 85 MB/s. The test with NTttcp showed a speed of 800Mbit/s. The speeds are not what you expect from a team of four 1GB NICs but at least the fily copy speed matches the NTttcp speed. I have run the same tests on other Servers, NICs and Switches (at a customer location) where the unexpexted results were almost consistent with the results described here. Can anybody please give me some pointers what i'm doing wrong. I'd like to have this figured out and get expected results before i start building a Hyper-V Switch on top of the LBFO team and run it in a production cluster. Thanks, Marc van Eijk
October 26th, 2012 1:56pm

Let's first start with the ntttcp perf. A couple comments - Your ntttcp tests are only using a single TCP stream. You will never get more than 1 NIC's worth of bandwidth through a team with a single TCP flow - In general, it is relatively hard to fill a 10G link with only a single TCP stream as small latencies translate to big drops in throughput and a single CPU core may not able to keep up with all that network traffic. - I would suggest using ntttcp with many simultaneous TCP streams in order to more accurately assess the throughput of a link. Additionally it would be wise to map these multiple TCP connections to different CPU cores so ensure you do not get CPU-bound. - Your ntttcp tests are driven by the amount of data you are sending, which by default is about a gigabyte. This is absurdly low for a 10G link (the test will take less than <1 second) and not very accurate for a 1G link either because the way ntttcp calculates throughput is by taking the total amount of data sent and dividing it by the time taken to send the data. Thus, it really does not surprise me that the throughput for sending 1 Gigabyte over 1 TCP stream varies dramatically on a 10G link since an extra 1 second latency translates to the difference between 1GiB/0.9s = ~9.5Gbps and 1GiB/1.9s = ~4.5Gbps. I would suggest having the tests be time-driven (-t option) and measuring the total throughput over a longer period of time, say 45 seconds or 1 minute. - The version of ntttcp you are using is 4.5 years old, doesn't include things like a warmup or cooldown period in measuring and was developed in a world where 10G was rarely deployed. There might be other tools better suited for measuring networking performance. If you are still unable to get the perf you are expecting, the following things might help you identify the bottleneck: - Note the number of TCP connections being established. Make sure it is a sufficiently high number - Note the utilization of each underlying teamed NIC on both the sender and receiver. Large imbalances on the send side could indicate that you are not using enough TCP connections or that the TCP connections vary greatly in throughput needs. Large imbalances on the receiver could indicate that your LACP switch is not configured correctly. - Note the CPU utilization on both the sender and receiver. Having cores completely pegged could indicate improperly configured RSS settings on your NICs - Try removing the switch and connecting the 2 LBFO teams back-to-back. If this fixes things, your switch may be introducing additional latency
Free Windows Admin Tool Kit Click here and download it now
October 26th, 2012 7:13pm

Hi Silviu, Thanks for your quick response. I did try the mutiple streams as well (4 and even 10 streams, but with the same NTttcp tool though). I have noticed that a lot of our customers, currently deploying Windows Server 2012 clusters, have questions about the throughput of LBFO. So i'm in the process of writing an extensive blog about it on www.hyper-v.nu and your input on the matter is more than welcome. From your previous answer I have some new questions. Does Microsoft have another tool that suits measuring 10G networking performance better than NTttcp. Or do you have another suggestion?Do you know if there is a good guidance for this tool so i can run representative tests? Thanks, Marc van Eijk
October 27th, 2012 8:35am

I'd be glad to review your blog post and give feedback. It's important for correct information to be out on the web. Also, if you haven't seen this already the User's guide is an excellent source of information on NIC Teaming. You can find it here: http://www.microsoft.com/en-us/download/details.aspx?id=30160 PowerShell documentation is also available at: http://technet.microsoft.com/en-us/library/jj130849 1. We have a more recent version of ntttcp that we use internally. Unfortunately it's not available to the public at the moment. I've also heard of customers using chariot or iperf although I've never actually used either. 2. With the old ntttcp, I would say run it with many TCP connections (> 10) that are mapped to separate cores and have the test be time driven (-t option). And again some variance should be expected.
Free Windows Admin Tool Kit Click here and download it now
October 30th, 2012 9:19pm

Hi Silviu, Thanks for your input and that you are willing to give feedback. I would be more than happy to accept your offer. I have based a part of my blog on the documentantion you mentioned. In the meantime i have done a lot of testing. I have installed Iperf (combined with Jperf, which gives some really nices graphs) but i keep getting the same result even with multiple streams and longer testing periods (30 seconds to 1 minute) I now have two servers running, giving the following result with a 2x10Gb LBFO Team (tested with two different NICs Broadcom and Emulux) with the teams configured to LACP and Address Hash and the switch (tested with two different switches, cisco and hp) ports set to LACP. When i run iperf with the following settings : iperf.exe -c 172.29.5.22 -P 4 -i 1 -p 5001 -f M -t 30 I get a maximum bandwith capacity of 1082MB/s, so it does exceed one adapter. But it isn't what you expect from the total of the two tNICs. What i would finally like to figure out is this. Let's say i have a Windows Server 2012 cluster with two Hyper-V nodes. The recommended configuration as described in http://www.microsoft.com/en-us/download/details.aspx?id=30160 for the LBFO Team (where the Hyper-V Switch in based on) is Switch Independant / Hyper-V Port. I totally agree but have some questions on live migration in that configuration. In a two node Hyper-V configuration with the LBFO team (existing for example on 8x1GB interfaces) is configured in as Switch Independant / Hyper-V port, each adapter on the Hyper-V Switch is bound to a single physical NIC. Fine for the VMs but the ManagementOS virtual adapters are also configured on this Hyper-V Switch, this probably means that these adapters also are bound to a single NIC. In the scenario where i want to move let's say 10 VMs with live migration from one node to another. Do the managementOS virtual adapters also get bound to a physical NIC?Is each live migration a single stream?And the two questions combined: If a node initiates 4 concurrent live migrations does all live migration traffic get send over a single physical nic in the LBFO team? I wanted to get the answers in my test result, but i can't seem te get past the base test of the maximum bandwidth capacity of the underlying LBFO team. Hope you can give some insight. Thanks, Marc
November 8th, 2012 9:16am

You need to investigate and discover what is bottlenecking your throughput, whether it be CPU, Switch, receiving machine, or something else. I would also try removing the team and seeing if you can get 20G using iperf without a team. To answer your questions: 1. Yes. In Hyper-V Port load balancing, LBFO treats all ports on the virtual switch identically. In fact it is not even aware that some are virtual adapters exposed to the management OS and others are exposed to a VM. 2. No. In Server 2012, live migrations will use 1 TCP stream for control messages (low throughput), 1 for transfer of VM memory and state (high throughput utilization), and if the live migration includes migrating the VHD, SMB will be used for that. SMB itself will use 1 or multiple TCP streams depending on your SMB multichannel settings. 3. It depends. The interface used for live migration will be whatever interfaces are available. If the only way to access the external network is through a virtual adapter exposed to the management OS on top of the virtual switch, then that is the adapter that will be used and in this case you will not get load-balancing of live migration traffic (in Hyper-V Port load balancing mode). If you have other NICs on your system that are not underneath a virtual switch, then they may be used as well.
Free Windows Admin Tool Kit Click here and download it now
November 20th, 2012 5:38pm

Marc & Silviu, I'm relieved to see that I'm not the only one having a similar issue. My 3x 1Gig NIC team(s) xfer data from hostA to hostB at 3Gbit speeds, as expected. That is, until attaching the team to a v-Switch. After that, I'm capped at 1Gbit per hostA-to-hostB SMB transfer (no matter what teaming configuration I try). You mentioned that we should identify the bottlekneck and I can't find any other causes except for the addition of the Hyper-V Virtual switch. Any help is much appreciated. -Digi
November 21st, 2012 3:16pm

@Digibloom: When I said identify the bottleneck, I meant identify what is limiting your network throughput (CPU, Switch, receiver, NIC, etc.). You can do this by performing some basic troubleshooting. Perform the operation in question (in your case SMB file transfer) and note things (on both sender and receiver) such as: - CPU utilization. Do you have any cores pegged? If so, check your RSS/VMQ settings. - Inbound and outbound throughput of the underlying physical NICs in your team. Do you have any NICs pegged? You might have misconfigured your LACP switch or LBFO team. - How many TCP streams is your traffic using? Too few and you will not get the full team's worth of throughput. In your case you can *skip* these since what you are seeing is expected behavior. If you have LBFO configured in Hyper-V Port load balancing mode then all traffic originating from a particular Hyper-V Port will be sent/received on the same physical NIC. That's how this load balancing algorithm works, ports on the virtual switch (including any that are exposed to the management OS), will be mapped to exactly 1 physical NIC. Thus in this case any network operations (including file transfer) originating from the management OS or a VM will be capped to the bandwidth of the physical NIC that virtual switch port is mapped to (unless you have configured multiple switch ports to be exposed to the management OS or to a VM, though this is not default behavior). If you have LBFO configured in TransportPorts load balancing mode then traffic will be distributed based on a 4Tuple hash of the connection parameters. In this case traffic from a particular port on the virtual switch can be distributed among multiple team members, but SMB traffic won't. This is because SMB multichannel is not supported through a port on the virtual switch (neither is SMB direct), thus SMB will not use multiple TCP streams for transferring data and you will be capped at one NIC's worth of bandwidth since only 1 TCP stream is being used to transfer the data. So to summarize: SMB multi-channel is *not* supported through a virtual switch, but *is* supported through a NIC team and that is why adding the virtual switch will cause a file copy operation to go from 3Gbit to 1Gbit. For more information on NIC teaming, see the user's guide: http://www.microsoft.com/en-us/download/details.aspx?id=30160 For more information on SMB multichannel, see this write-up: http://blogs.technet.com/b/josebda/archive/2012/06/28/the-basics-of-smb-multichannel-a-feature-of-windows-server-2012-and-smb-3-0.aspx
Free Windows Admin Tool Kit Click here and download it now
November 26th, 2012 12:11am

@Digibloom: When I said identify the bottleneck, I meant identify what is limiting your network throughput (CPU, Switch, receiver, NIC, etc.). You can do this by performing some basic troubleshooting. Perform the operation in question (in your case SMB file transfer) and note things (on both sender and receiver) such as: - CPU utilization. Do you have any cores pegged? If so, check your RSS/VMQ settings. - Inbound and outbound throughput of the underlying physical NICs in your team. Do you have any NICs pegged? You might have misconfigured your LACP switch or LBFO team. - How many TCP streams is your traffic using? Too few and you will not get the full team's worth of throughput. In your case you can *skip* these since what you are seeing is expected behavior. If you have LBFO configured in Hyper-V Port load balancing mode then all traffic originating from a particular Hyper-V Port will be sent/received on the same physical NIC. That's how this load balancing algorithm works, ports on the virtual switch (including any that are exposed to the management OS), will be mapped to exactly 1 physical NIC. Thus in this case any network operations (including file transfer) originating from the management OS or a VM will be capped to the bandwidth of the physical NIC that virtual switch port is mapped to (unless you have configured multiple switch ports to be exposed to the management OS or to a VM, though this is not default behavior). If you have LBFO configured in TransportPorts load balancing mode then traffic will be distributed based on a 4Tuple hash of the connection parameters. In this case traffic from a particular port on the virtual switch can be distributed among multiple team members, but SMB traffic won't. This is because SMB multichannel is not supported through a port on the virtual switch (neither is SMB direct), thus SMB will not use multiple TCP streams for transferring data and you will be capped at one NIC's worth of bandwidth since only 1 TCP stream is being used to transfer the data. So to summarize: SMB multi-channel is *not* supported through a virtual switch, but *is* supported through a NIC team and that is why adding the virtual switch will cause a file copy operation to go from 3Gbit to 1Gbit. For more information on NIC teaming, see the user's guide: http://www.microsoft.com/en-us/download/details.aspx?id=30160 For more information on SMB multichannel, see this write-up: http://blogs.technet.com/b/josebda/archive/2012/06/28/the-basics-of-smb-multichannel-a-feature-of-windows-server-2012-and-smb-3-0.aspx
November 26th, 2012 7:48am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics