Poor 10 GbE throughput/performance in Server 2008 R2
I have an issue with proof of concept that I'm running. I have a Server 2008 R2 backup media server running ARCserve r15 and I'm pulling data off our SAN (8 Gbps FC) and sending it to an EMC/Data Domain unit over an Intel X520 10 GbE adapter (directly connected - no switch) via CIFS shares. I have 4 jobs running right now, sending about 1.5-2 TB of data. The backup media server is a DL380 with 48 GB of RAM and two X5680 CPU's. The 10 GbE and 8 Gb FC cards are connected to PCI-Express 8X slots - so I don't think this is a hardware issue. At first I ran into a rather obvious throughput ceiling at about 30% utilization - it literally jumped up to 30% and flatlined there until the transfer was done. After some research on Intel's site, I turned off "Interrupt Moderation Rate" in the drivers for the card and I was able to push it higher. CIFS file transfers now bounce around between 30 and 50% utilization. The interesting thing is that when I run iperf, with 4 TCP streams I can run the link up to 99% utilization with the TCP Window set to 64KB or higher. So, since I can't set the TCPWindowSize in Server 2008 R2 anymore (tech docs say it ignores that registry key) and increasing the number of streams doesn't have any impact, I'm sort of at a loss. Only utilizing 1/3 of the throughput of the adapter kinda sucks. If I hit a ceiling at 80% I'd say that's pretty good and I'm likely hitting the limits of the 8 Gb fiber channel HBA... however, in running tests that generate traffic on the fly rather than copying from a source, I'm still having trouble pushing past 50% utilization except for running iperf and specifying a TCP Window size of 64KB or larger. Any ideas on how I can solve this? I'm not even sure if the issue is with the backup server, backup software or the Data Domain unit.
September 18th, 2011 7:26pm

I guess as a followup to my above situation... is anyone backing up to disk from a single media server at a rate higher than 3 Gbps?
Free Windows Admin Tool Kit Click here and download it now
September 19th, 2011 3:39pm

Please set netsh interface tcp set global autotuninglevel=restricted and see if you see any difference in performance.Sumesh P - Microsoft Online Community Support
September 20th, 2011 3:12pm

I already have that set to disabled... should I still try at restricted?
Free Windows Admin Tool Kit Click here and download it now
September 20th, 2011 3:44pm

By disabling the autotuninglevel your window size will never increase beyond the default 64KB. On a 10Gb connection this could limit your transfer speed. You want that receive window to grow so there is less protocol overhead. If you are not running 2008 R2 SP1 please install the autotuning hotfix. http://support.microsoft.com/kb/983528 Then set autotuning to normal or restricted. http://technet.microsoft.com/en-us/library/cc731258(WS.10).aspx I would also set your congestion provider to CTCP (Compound TCP). netsh interface tcp set global congestionprovider=ctcp And finally, you need jumbo packets to get maximum throughput. Make sure you are using the maximum possible frame size allowed by the devices in the chain. Increasing the frame size will improve both your IOps and your throughput. At the same time your CPU utilization will go down. But the most important thing to remember when doing a file copy test is the read speed of the source and the write speed of the destination. The real-world maximum throughput of a 10Gbps file transfer is roughly 625 MBps. This is primarily because of bandwidth-delay product. http://en.wikipedia.org/wiki/Bandwidth-delay_product "A high bandwidth-delay product is an important problem case in the design of protocols such as TCP in respect of performance tuning, because the protocol can only achieve optimum throughput if a sender sends a sufficiently large quantity of data before being required to stop and wait until a confirming message is received from the receiver, acknowledging successful receipt of that data. If the quantity of data sent is insufficient compared with the bandwidth-delay product, then the link is not being kept busy and the protocol is operating below peak efficiency for the link. Protocols that hope to succeed in this respect need carefully designed self-monitoring, self-tuning algorithms.[2] The TCP window scale option may be used to solve this problem caused by insufficient window size (which is limited to 65535 bytes) usable by the sender before requiring an acknowledgment from the recipient." If your bandwidth tester is only one-way, like a UDP send which requires no acknowledgement of the packets, your synthetic throughput will be greater than TCP, which requires acknowledgement or the packets are resent. And again, if there is any disk IO limitation you will not reach your maximum potential versus a synthetic test.Sumesh P - Microsoft Online Community Support
September 21st, 2011 4:21pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics