Windows 2008 R2 - large file copy uses all available memory and then tranfer rate decreases dramatically (20x)
I have a problem that was discussed in the following link but never resolved. I'm unable to reply to that thread, so I've created a new one in the hope that someone might be able to help. http://social.technet.microsoft.com/forums/en-us/windowsserver2008r2general/thread/74C2C9CA-F8C1-4C37-BC8C-CD074CE0C6CD?prof=required I have two Windows 2008 R2 servers, and I'm trying to copy large (minimum 50GB) files back and forth between the servers. If I copy a 50GB file from server 0 to server 1, the transfer rate stays at just below 1 gigabit/sec on a gigabit switch. However, if I copy a 50GB file from server 1 to server 0, the copy begins at just below 1 gigabit/sec, but once the amount of data transferred is equal to the amount of available RAM on server 0, the transfer rate steadily decreases (will continue to decrease rapidly and might level off at just 50 megabit/sec). It doesn't matter if the file is pushed or pulled. Server 0 is a Dell PE2950 with 24GB of RAM and 2 dual core Xeon 5110 CPUs @ 1.6GHz Server 1 is a Dell PE2950 with 32GB of RAM and 1 quad core Xeon E5420 CPU @ 2.5GHZ I have seen this happen before on Windows 2008 x64 without R2, and I've used DynCache http://www.microsoft.com/downloads/en/details.aspx?FamilyID=e24ade0a-5efe-43c8-b9c3-5d0ecb2f39af&displaylang=en to resolve it. However, DynCache is not supported on Windows 2008 R2, and it's not supposed to be needed on R2 because the problem was supposedly fixed / solved. Interestingly, I only have the issue on one of the two R2 servers. In task manager on the problem server, as soon as I start the file transfer, I can watch the available memory begin to drop. At the moment I have 24GB of RAM in the server, and about 16GB of that is available. Once 16GB of the 50GB file has been transferred, the available memory gets down to 0 in task manager, and then the transfer rate tanks. The OS was installed just a week or two ago. It has Hyper-V and SNMP installed, as well as the latest Windows updates. I then installed the File Services role as well, but the problem still exists. Nothing else has been installed. Clearly there is still an issue here in Windows 2008 R2, but it doesn't seem to affect all servers in all situations. There are also clearly other people having the same problem, but to my knowledge Microsoft has yet to acknowledge or address the issue in Windows 2008 R2. Can anyone help? Thanks.
October 22nd, 2010 12:58am

Hi DougZuck, Thanks for posting here. I would suggest to check if this issue persist with perform the commands below to disable the TCP Chimney Offload/Receive Side Scaling feature in Windows server 2008 R2. netsh interface tcp set global rss=disabled netsh interface tcp set global autotuninglevel=disabled Reboot the server For background information, please refer to the article below: Information about the TCP Chimney Offload, Receive Side Scaling, and Network Direct Memory Access features in Windows Server 2008 http://support.microsoft.com/kb/951037 Thanks. Tiger Li TechNet Subscriber Support in forum If you have any feedback on our support, please contact tngfb@microsoft.com Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
October 22nd, 2010 6:00am

Thank you for your response, Tiger Li. I followed your instructions and did the following, but the problem still exists. netsh interface tcp set global rss=disabled netsh interface tcp set global autotuninglevel=disabled Reboot the server Any other suggestions? Thanks.
October 22nd, 2010 4:29pm

Hi DougZuck, Thanks for update. Are there any error occurred in event log ? Please verify which application’s memory usage is increasing by perform “perfmon.exe /res” to use resource monitor when large file transfer begin. What if remove installed roles(hyper v , fileserver ,snmp) on server, is this issue persisted? Please update the latest firmware for NIC and RAID controller . Thanks. Tiger Li TechNet Subscriber Support in forum If you have any feedback on our support, please contact tngfb@microsoft.comPlease remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
October 25th, 2010 8:41am

No event log errors Resource monitor shows SYSTEM increasing in memory usage when the copy starts RamMap confirms that the file that's being copied is what's being cached and what's using up all the RAM The problem still exists when FileServer role and SNMP are removed. I'm not able to remove Hyper-V from this server because it needs to continue hosting virtual machines. Firmware for NIC and RAID are already latest versions. The issue is clearly an OS issue. The issue existed in 2008 x64 without R2, so I'm not surprised that it also exists in 2008 R2. The problem is that at least in 2008 x64 without R2 you could use DynCache to work around it, but DynCache unfortunately cannot be used on 2008 R2. Additionally, in 2008 R2, the problem seems to only exist on some servers. It's unclear why this is the case. Thanks.
October 25th, 2010 7:13pm

How are you performing the file copy? I think you're seeing the effects of a buffered file copy. Try using XCOPY with the new /J switch to do an unbuffered file copy. I've been able to move roughly 4GB per hour between servers using that new switch, with no ill effects to the servers themselves.
Free Windows Admin Tool Kit Click here and download it now
October 27th, 2010 9:16pm

Tracy - thanks for the response. Yes, I'm able to use xcopy /j, but as you pointed out, it is EXTREMELY slow. 4GB per hour is simply not going to cut it. I have files that are hundreds of GB. Thanks, Doug
October 27th, 2010 9:25pm

Is it possible that whatever you are using to copy the files (may be windows explorer itself - which is a poor method to use if copying that much data) simply isnt releasing the used memory properly and when it slows down its actually utilizing the pagefile for memory? I personally would use Robocopy for a job like this and its worked very well for me. I have not noticed the exact speed of the transfer's but typically its limited to the slowest component in the transfer, HDD, SAN, NIC, HBA, CPU, LINK, Switch, etc. Just a thought.
Free Windows Admin Tool Kit Click here and download it now
October 27th, 2010 9:54pm

Robocopy and Windows Explorer both exhibit the same behavior. The file gets cached, available memory drops to 0 or near 0, then the file transfer rate drops dramatically. You can avoid the caching by using xcopy /J, which is an unbuffered copy, but it's too slow for transferring very large files. -Doug
October 27th, 2010 10:30pm

But since you only see the behavior on one server and not both i would assume some kind of problem on one server that is apparently going to be hard to identify.
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2010 4:11pm

Yes, that's true. However, please also note that this is a brand new installation of Windows 2008 R2. The server was then joined to the domain. Then Hyper-V was installed and Windows Updates were applied. Nothing else was done (except later installed FileServices role just to see if it might fix the caching issue, but it didn't). Additionally, I have seen numerous other postings on the web from other people seeing the same behavior in 2008 R2. The issue seems pretty clearly an OS issue, especially since it also exists in 2008 x64 without R2. But again, I'd like to highlight that in 2008 without R2 you could use DynCache to modify the caching behavior. DynCache, however, is not supported on 2008 R2.
October 28th, 2010 5:04pm

This sounds like the disk is not able to keep up with the network throughput. A perfmon to see the caching would help in troubleshooting this. If the server is not primarily a file server you can configure the following so that there is less emphasis on caching: HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management LargeSystemCache=0 (DWord decimal) If the server/applications are slowing down because it is running out of memory then you can configure the OS to try to keep a little more padding. By default a 64bit OS will try to keep available memory at 64MB or higher. 64MB limit is good because that means the OS can use the extra memory not being used to cache files in memory. Memory is much faster than disk access. However if your system has large spikes in memory usage then setting a higher Low Memory Threshold might prevent some sluggishness during sharp high memory demands. Below is a sample configuring it to 200MB HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management LowMemoryThreshold=200 (DWord decimal) Windows 2008 R2 has improved memory management algorithms in comparison to windows 2008. Windows 2008 R2 should not need Dyncache beyond the following hotfix that further refines the new memory management algorithms. 979149 A computer that is running Windows 7 or Windows Server 2008 R2 becomes unresponsive when you run a large application - http://support.microsoft.com/default.aspx?scid=kb;EN-US;979149 Anything further than this would require a paid incident to fully troubleshoot what is happening to the memory on your system further. Hope this helps David J. This posting is provided "AS IS" with no warranties, and confers no rights. Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
November 11th, 2010 12:11am

Thanks for the reply! Those are very good suggestions and I will try them all and report back soon. -Doug
November 11th, 2010 1:31am

Hi Doug, did you reslove you problem? I'm facing the same problem now. I tried these suggestions above, but none workedLC
Free Windows Admin Tool Kit Click here and download it now
November 17th, 2010 1:05pm

I spent some time today doing some more testing, but unfortunately neither the registry key nor the hotfix have fixed the problem. -Doug
November 17th, 2010 10:11pm

I'm having same issue when copying from drive to drive on Windows Server 2008 R2 with hyper-v role test server. Used physical memory goes up and down during copying process, system responsiveness overall is very bad.
Free Windows Admin Tool Kit Click here and download it now
November 24th, 2010 10:07pm

How come MS does not solve this? There is a lot of posts on many sites about this issue.
November 25th, 2010 11:10pm

Hi, We have to analyze the SMB packets , please collect simultaneous netmon / ethereal traces between server 0 and server 1 and another simultaneous trace between server 1 and server 0. You have to make sure that a) start the trace b) reproduce the problem c) stop the trace. upload it to your ftp / location from where i can analyze the data.
Free Windows Admin Tool Kit Click here and download it now
November 26th, 2010 7:32am

Hi I have a very similar problem. In my case, it is a brand new Dell R415, Server 2008R2 and Hyper-V installed. Interestingly, the problem only occurs when copying files to and from iscsi drives. For example, one of iscsi devices is a Netgear ReadyNAS. If I map a SMB share, a 10GB file will copy at 11.5MB/Sec (100mbps network) as expected with no rise in memory usage. Copy the same file to the same device, but this time to a mounted iscsi path and the memory usage rises until it is maxed out. This is on the Lan network using the onboard broadcom NICs. We have a Broadberry SAN on a separate ISCSI gigabit network with managed switch connected to a dual Intel PT NIC on the same R415 server. Copying files to this device exhibits the exact same behaviour. We also have an older Dell 1900, 2008R2 on the same network. This too has hyper-v and connected to both networks and doesn't have the same issue to these devices. As we are a Gold partner, I have opened a case with MS on Monday using one of our incidents. The performance team and networking team are both taking turns at solving this issue, but nothing as yet. The latest is they have done exactly what Sainath has suggested and are currently examining the netmon traces. In the meantime, if anyone has any suggestions, they would be very well received!
December 2nd, 2010 12:34am

Simon - thanks for sharing. Please do let us know what the MS team comes up with. -Doug
Free Windows Admin Tool Kit Click here and download it now
December 2nd, 2010 2:27am

Anything heard back from MS yet? Simon?
December 15th, 2010 6:48pm

Hello Simon, I've got here the same exact Problem: Dell Server R415, Server 2008 R2 and Hyper-V. What I can say at the moment: Same Problem as you describe. But here some additional Informations: - We've additionaly have a Intel Gigabit Card. --> same Problem on both sides - Also, I tried with direct cabling (without switch) --> same problem
Free Windows Admin Tool Kit Click here and download it now
December 21st, 2010 6:22pm

We also see this behavior on our iSCSI Sans, all 3 IBM DS3500 SAN's, and all IBM x3650 M2's as hosts, (some in cluster confugrations, some not. Performance degrades so badly that IO issues crop up with the running Hyper-V images, and start crashing those (putting them in critical, or saved state. Tjhis occurs even on hosts with 96 GB of RAM. IOPS on the iSCSI SAN do start to peak and hit critical thresholds, so I'm assuming the server starts buffering and caching rhe copy to memory when it cant write tot he disk. This causes hard faults in the memory and kills the file copy and even the running images, as I mentioned before. I think it has to do more with SAN throughput though, because if we have two Shared CSV's, the RAID 10 SAS array images chug along happily, while the images running on SATA start dying. A fix, or explaination would be hugely helpful as I've been struggling with this for over a year now.
December 29th, 2010 2:44am

Has anyone tried the ESUTIL as outlined in this Blog: http://blogs.technet.com/b/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx ?
Free Windows Admin Tool Kit Click here and download it now
December 29th, 2010 2:47am

Hi Guys Sorry for the delay, work committments etc.. I worked with Microsoft for the best part of two weeks, but unfortunately they did not come up with an answer. We had the performance team, iscsi team and network teams all working on this but no useful answer. To be honest, I now don't think there is a single answer. Using Jperf (iperf) on default settings, I now have a wirespeed of 600mbps, not perfect, but usable. I achived this using a mixture of windows TCP tweaks from Starwind's site http://www.starwindsoftware.com/forums/starwind-f5/recommended-tcp-settings-t2293.html, updating all the network card drivers, and changing and matching EVERY setting on the SAN and Hyper-V server until I got a result. According to anyone who knows anything about ISCSI, I should be getting 950mbps wire speed. But with the effort I have put into this, a consistent 600mbps is acceptable. We have an older Dell Powerconnect GB switch. Interestingly, jumbo frames made things worse. Though I believe there are differences between switches as to whether they use 9014 or 9000 MTU size. Sorry I don't have a fix, but try the starwind info above. There is another thread about offloading and hyper V, that makes a differnce too. Changing and matching settings helps, as does updating the drivers. The last driver gave me 30mbps extra. As for measuring, I like Jperf, but Iometer appears to be a professional tool for measuring wire speed and disk reads / writes, though I've not tried it yet.
January 25th, 2011 2:50pm

This is utterly ridiculous. We just noticed this problem as well, and it's quite concerning that a company that has been developing server operating systems for well over 15 years is not able to provide a stable OS that allows for large files to be copied without bringing down the server. To be honest, we have noticed a concerning trend. Starting with Server 2008, we have seen an increasing amount of bugs with the OS (performance, stability) that are not being fixed by Microsoft. There is clearly no interest from Microsoft's side to address these issues that are obviously and easily reproducible. In our case it doesn't involve SAN etc. at all. It's Windows 2008 Storage Server Standard just copying a 50Gb file that brings the server down in minutes. Should we downgrade to Windows NT or Windows 2000? :-)
Free Windows Admin Tool Kit Click here and download it now
February 18th, 2011 9:19pm

hi guys. just thought i'd throw this into the mix. we are facing a similar problem at a clients. we have just benn virtualising a clients server onto the same hardware. we used the shadow protect HIR methodology and got the machine up and running inside a temporary server. then rebuild the hardware adding new raid 5 arrays to host virtual machines. the hardware is an intel s5000 based server with 20Gb of RAM. the host OS is installed on the intel embedded raid aray. there are 2 raid 5 arrays on indepenant lsi 9240-8i controllers which will each host a virtual machine.. we have removed the sata harddrive from the temporary server containing the vhd's for the new virtual machines, connected it to a sata port on the mobo and are copying the vhd's to the 2 raid 5 arrays. it's currently copying at a mind blowing 6Mb / sec i don't think this is solely a network issue. i think you guys see it manifesting itself as such and there is a deeper problem... we only just did it yesterday so i haven't investigated too much. i get the same issue more or less. available memory plumits and then copy time goes up exponentially almost I will report back when and if i get any useful information. but for now i am going to simply wait for the copy to finish as i need to get the machine back up and running.... cheers chris
March 6th, 2011 11:05am

I am seeing the same issue w/ a twist. See strange test results below. Problem: When copying files larger than 2GB from one drive to another on a server running Windows 2008 R2, the transfer rate is 20MB/s or less. Environment: Windows 2008 R2 server running on VMware ESX 4.1 connected to two iSCSI volumes [Drive S: (source), Drive D: (Destination)]. 10GB network connection to SAN employing MPIO. Windows 2003 R2 server running on VMware ESX 4.1 connected to two iSCSI volumes [Drive S: (source), Drive D: (Destination)]. 10GB network connection to SAN employing MPIO. Test Scenario: Scenario 1: Windows 2K8 R2 server on ESX 4.1 has Drive S: and Drive D: connected via iSCSI over 10GB network. Copy large (3GB) file from Drive S: to Drive D:. The transfer rate shows as ~19 MB/s, and takes wel over an hour to transfer. Copy small (512MB) file from Drive S: to Drive D:, and this transfers right away. We then disconnect the drives from the W2K8 R2 server. Scenario 2: Then connect the same drives to Windows 2k3 R2 server on ESX 4.1 connected via iSCSI over 10GB network. Copy large (3GB) file from Drive S: to Drive D:. The transfer rate shows as 147 MB/s, and transfers in minutes. So, it appears the issue is the Windows 2008 R2 O.S., or some technology it is leveraging diffrently than Windows 2003 R2 A very interesting oddity I tripped across - When running Scenario 1, (W2K8 R2), I ran an IO Meter write test pointing to the Drive S: (the drive we are transferring the file from). The instant I start writing to that volume, the file copy rate on the large file job going from Drive S: to Drive D: jumps from ~19MB/s to ~300MB/s, and the file copies in less than a minute. This is reproducible time and again. For some reason, writing to the Source Volume using IO Meter causes the transfer rate jump up, and sustain the expected rate. Any ideas or pointers are greatly appreciated. -Gary
Free Windows Admin Tool Kit Click here and download it now
March 9th, 2011 1:18am

We have been experiencing this issue for over a year on two different hardware platforms; one running Win 2K3 x64 Ent and the other running Win 2k3 ia64 both attached to SAN. I'm pretty sure this issue will arise no matter the OS version. The source drive seems to be the issue in some cases only being able to read about 500kbs to 8mbps. The activity on the SAN is almost non existant as is the destination on our virtual library. Our process is to run a nightly full SQL backup to a consolidated file server. Then our backup software backs up that consolidated volume to a virtual library. Both the SQL backup and the backup to virtual library have this same issue. The originating server is obviously a SQL server. We have many SQL servers that send backups here. The consolidated backup file server is also a SQL server. The virtual library is just a backup solution.
March 9th, 2011 8:08pm

mgr34 - thanks for the input, but I really don't think the issue you're experiencing is the same as the issue being discussed in this thread. Thanks.
Free Windows Admin Tool Kit Click here and download it now
March 9th, 2011 8:24pm

mgr34 - thanks for the input, but I really don't think the issue you're experiencing is the same as the issue being discussed in this thread. Thanks. I believe it is as the performance related to available memory is identical to what you described in the original post in the 2nd to last paragraph. It is more visible on our SQL server because of the way memory is allocated for SQL. We would usually have about 6-8gb of memory free for the OS and other services. Last night we configured SQL to free up 50gb of memory prior to the backup running and we saw the performance hold out until that 50gb was used up by the file copy. Do you agree that sounds like what you're dealing with?
March 9th, 2011 11:40pm

Interesting. That does, indeed, sound like the issue we're experiencing. What's strange to me is that I've never seen this problem on Windows 2003. We have hundreds of SQL databases on Windows 2003, with many being 500GB to 1.5TB in size, and I've never had this issue either doing SQL backups or large SQL mdf file copies on Windows 2003. In Windows 2008 non-R2, this problem is readily apparent but work-around-able with DynCache. In Win 2008 R2, this problem seems to happen with some servers but not others, and unfortunately there is no workaround. -Doug
Free Windows Admin Tool Kit Click here and download it now
March 9th, 2011 11:59pm

Today I was working on a Windows 2008 R2 server with 2 MPIO iSCSI connections to a volume. As mentioned in my test above, we started a file transfer, but in this case we were copying from an iSCSI volume to a local physical volume. The transfer rate was ~18MB/s. We started IO Meter doing a write to the source drive as I had done in my test above, and once again, the rates jumped to 155MB/s. In this case it is a physical server, not a VM. So, I am really curious what IO Meter does that it "opens" up the communication. Any ideas what IO Meter is doing?
March 19th, 2011 1:15am

I've been banging my head against a wall for a week of 20 hour days. I'm building a prototype / test server (Windows Server 2008 RT 64bit Ent) in my home lab with the hopes of running Hyper-V for a multi-server dev / test environment. My server is a PC class machine, I7 960 (8 core) @ 3.2GHz, 12GB Kingston DDR3-1333 RAM (going to 24), on an Asus P6X58D-E mobo, built-in GB NIC (Marvell Yukon 88E8056) and I've added an Intel 1000/Pro GT GB NIC, 2 WD 1TB Caviar Black HDs mirrored on the built-in Marvell 6GB/s RAID controller. Before I go too far, I should remind you that the plumber's fawcets always leak, the mechanic always drives a beater, and electricians sit in the dark. You'll see what I means shortly. I built the "server" last week and eventually had Hyper-V running with 4 test server guests. The whole time I was building the server I struggled with performance problems. I wasn't sure if it was network or disk but I found the recommendations for disabling all of the offloading and greening in the advanced NIC settings (which helped a little) and kept going. Because this is a workstation class server, for the most part, I'm at the mercy of the "out-of-box" Microsoft drivers as most hardware vendors aren't providing 2008 RT 64bit drivers for workstation hardware but I was "lucky" and had managed to find most of the drivers I needed from the actual vendor sites (not from Asus). All was going well, I was many hours into building the guest servers and then I decided to shut the server down and move it to my lab (it was in my living room up til that point). When I fired it back up it was dead. I was getting a BSOD and reboot so fast, I had to film it then step through the video one frame at a time to see that it was a Stop Error 7B (hardware, likely disk problem). Chkdsk /F or /R ended up being completely useless because of a memory leak in 2008 RT Chkdsk when encountering very large files (VLFs) that is apparently a known issue but not being worked on to resolve because it doesn't affect too many people. Yeah right! Only those with large VHDs and databases need worry! The Chkdsks mostly failed and what I ended up with was a trashed system/boot partition, a completely wiped out Apps partition but my Data partition with my VMs and VHDs appeared, for the most part, intact. I tend to blame a crappy Marvell storage driver for this corruption but the jury is still out. I know this is getting long winded but bear with me: I'm documenting this for myself and anyone else that's Googling this problem because there is a lot of conflicting crap out there regarding this issue(s). I lost 1/2 a day trying to resolve this issue before giving up and starting over. This time I was going to be dilligent and back up as I went along (see my comment about the plumbers/mechanics/electricians). I rebuilt the server, formatting (not quick - never quick) the partitions and when I had a bare bones, patched, service packed, and repatched server I backed it up to an empty (thanks to Chkdsk) partion on the server. The next morning I went to copy the backups to my XP workstation and that's when I was struck by the poor performance issues again. Keep in mind, Hyper-V wasn't even installed yet. The server had no roles or features installed yet. I ran into all the problems mentioned above: slow network, all memory going to cache until it's at 0MB free and never being relinquished even after the transfer is aborted - making the server very sluggish to completely unresponsive, network activity dropping to 0 for periods of time in the middle of a transfer, transfers of large files taking forever or being unable to complete, etc. When looking in Perfmon, I was getting around 6MB per interval (MBpI) on a GB network it didn't matter if it was push or pull. From the XP workstation to the server I was getting about triple that performance, push or pull. Not great but better. Win7Pro on my laptop was getting comparable results over 130Mbps wireless. But as I'd be watching the copy progress in Perfmon, I'd see the odd peak of 20 MB per interval, an average of 6 MBpI, but then periods of complete flat-line 0. There was no other traffic on the GB network. If I browsed the mapped drive suddenly I'd get activity on the transfer again or if there already was activity, I'd get a 20 MBpI peak. So in response to Gwaters' question, I don't think it matters what IO Meter is doing so much as the fact that it is doing "something" on the server which keeps the NIC and stack awake. I downloaded DiskBench and was getting decent performance (1.2 GBps) from disk to disk on the local server - nothing near the 5 or 6 GBps the controller / drives are set for but decent. Everyone on the net was saying to run network captures to provide them more info so I ran Wireshark and besides a few lost segments and a whole bunch of SMB2 traffic that I wouldn't have expected since XP was the destination, I didn't see anything bad. I researched the problem some more and ran into all kinds of conflicting information regarding NIC and TCP settings and registry tweaks for LANMAN Server, and workstation and TCPIP. I have honestly tried almost every combination over the last several days. And it's completely hit and miss. This morning I had a pleasant surprise: Microsoft had released an update for: Slow performance in applications that use the DirectWrite API on a computer that is running Windows 7 or Windows Server 2008 R2. (http://support.microsoft.com/kb/2505438) Surely this couldn't be related. Surely a font issue wouldn't cause all kinds of performance problems. Guess what? It resolved some of the issues. Performance increased about 1MBpI on the backup copy to the XP box. There are no longer any dead periods of 0 network activity on a large transfer. Memory on the server isn't dwindling down to 0 MB free and never being relinquished. Hooray! Oh snap, overall performance still sucks. So I kept going. I downloaded LanSpeedTest. It flushes the caches and removes the hard drives from the equation to do a test of the network transfer. I ran a bunch of tests before and after applying all of the http://www.starwindsoftware.com/forums/starwind-f5/recommended-tcp-settings-t2293.html tweaks. Before: Copy 20MB file to XP from 2008 = 143Mbps Writing, 223Mbps Reading Copy 3GB file to XP from 2008 = 109Mbps Writing, 219Mbps Reading After Starwind's Tweaks: Copy 20MB file to XP from 2008 = 90-171Mbps Writing, 189-215Mbps Reading (ran multiple times because of a poor first run attributable to the server possibly still settling down after a reboot) Copy 3GB file to XP from 2008 = 112Mbps Writing, 221Mbps Reading So I'm not seeing anything like the gains that Simon gained. Re-reading his post I see there's a reference to another link at Starwind specfic to Hyper-V I should investigate further. But so far, it doesn't seem to matter what I do to the stack, everything still works about the same. I may never see that kind of performance since I'm running a cheapola Netgear GB switch. The items that are having a positive effect are: Disable all offloading on the advanced NIC settings - especially to get Hyper-V to even work. The MS performance patch: http://support.microsoft.com/kb/2505438 Disabling virusscan on the client = 1 MBps for VLF copies. One thing I forgot to mention is that I have the exact same performance from the built-in Marvel Yukon NIC with the latest drivers and the Intel NIC with the MS drivers. I was hoping the Intel NIC would save the day. It didn't. If anyone has any other recommendations I'll be willing to try them. Now wish me luck. I'm going to resume my work with Hyper-V and hopefully I don't lose the whole server again. Cheers, Lazarus
Free Windows Admin Tool Kit Click here and download it now
March 24th, 2011 12:06am

Unfortunately I'm now experiencing the same issue on a different Server 2008 R2 machine, but this one doesn't have Hyper-V or any additional server roles. Also, just as a point to Lazarus's post about KB2505438, it doesn't seem to have any impact on the issue on either of the machines that I see the problem on. Hopefully Microsoft will address this issue at some point. Thanks.
March 31st, 2011 8:48pm

Add 1 to the list of people who are experiencing this same problem. My environment doesn't include a SAN or iSCSI volumes, just standard SAS drive volumes. Am I correct in assuming this is a Server 2008 R2 issue, and not a Hyper-V issue? Each night we copy: * FROM: About 100GB of SQL backups from a physical machine (SQL Database server) * TO: the HOST machine of a Hyper-V machine (we do this because our backup drive is a USB external HDD, which none of the virtuals can access) Anyone have success with software such as ViceVersa PRO, or are we likely to experience the same situation with that (where it consumes all available memory on the target machine)? -Rich
Free Windows Admin Tool Kit Click here and download it now
April 26th, 2011 5:57pm

Rich - are you doing local SQL backups and then copying the .bak files in a second step? If so, one possible workaround might be to execute the SQL backups directly to the destination server in one step rather than doing them locally and having a second step to copy them over. I don't know for sure if this would work, but I think it might. -Doug
April 26th, 2011 6:09pm

I am also seeing this problem copying large files (exported VMs) from a SAN to the local SAS RAID on a Windows 2008 R2 server. I will follow the thread, thanks for your work. CarolChi
Free Windows Admin Tool Kit Click here and download it now
April 28th, 2011 12:21pm

I’ve had my hyper-v server (Windows Server 2008 R2) attached to my SAN for about one year without any performance issues. Normally I get about 100MB/s read and write performance. My server is a Dell R710, 4 NICs dedicated to iSCSI with Round Robin. Two weeks ago I upgraded the server hardware and also did a complete OS reinstall. After the upgrade read performance has dropped to about 30MB/sec. Except for the difference in hardware (additional CPU and RAM) the configuration is exactly the same. KB2505438 did not resolve my problem.
May 11th, 2011 1:08pm

We have a few VM hosts HP DL585G2 and a DL385 G5 both having the same issue. After making some changes I have been able to get a 75GB file to copy from a Physical server to a VM guest. After making the following change On the VM host & some on the VM guests I have had better luck copying large files. 1) applied all the latest windows updates (including sp1 for 2008 R2) Including KB2505438 everything but IE9. 2) My config has a nic for each vm, currently 3 vm per host and the HOST has a total of 6 interfaces. 2a) disable all un-used interfaces. 2b) disable power mgmt to turn off each interface. 2c) NIC Properties I only checked Microsoft Vitrual network switch protocol and HP Network Configuration Utility. -ON the VM HOST I do not have any IP4 or 6 checked everything is unchecked... only thing checked is listed in 2c. 3) On the VM guest the Network Interface has everything checked 3a) I have seen aticles that talk about unchecking TCP Checksum offload, however I have not made any changes there. 3b) install all the windows update on VM guest. 4) Shut down vm guest 5) reboot VM host 6) power up VM guest and try the copy again. During this copy my disk write avg was 36 MB/s 5/13/2011 --- I am two for two now on large file copies after making these changes. On Monday I will try again, and the try on my other VM server that was having issues.
Free Windows Admin Tool Kit Click here and download it now
May 12th, 2011 8:34pm

Thanks for testing and posting, but I'm not sure the problem you're describing is really the same as the problem being discussed in most of the rest of this thread. For now I've "unproposed" your answer. You're issue, if I'm understanding correctly, is with copying large files from inside VM guests to other physical servers. The problem on this thread is more about file copy performance, in general, from physical server to physical server. Additionally, it's not a question of being unable to perform the copy. The question on the table is "why is performance so bad when doing a large file copy, and why does it start good and then gradually get worse as the free memory available in the machine is used up?" Additionally, you mention that you are able to get 36MB/sec. When copying large files from one physical server to another physical server on a gigabit network, we should be able to get upwards of 125MB/sec. 36MB/sec simply isn't acceptable for our purposes. -Doug
May 13th, 2011 10:21pm

To be more clear on what my issue was, which I believe is what others are experiencing: When copying large files they fail and all memory marked as FREE become used. The error I would get when the large file copy fails An unexpected error is keeping you from copying this file. I fyou continue to receive this error, you can use the error code to search for help with this problem. Error: 0x8007046A: Note enough sever stroage is available to process this command. My file copys... I RDP into the VM and UNC or map a drive to a phyiscal server and copy the file over. Additionlly when watching task manager performance tab I see all memory posted under FREE get used up. This sounds like the issue others are experiencing. Todays test failed... calling MS to open a ticket.
Free Windows Admin Tool Kit Click here and download it now
May 16th, 2011 5:14pm

My case was solved using Tiger Li’s suggestions: netsh interface tcp set global rss=disabled netsh interface tcp set global autotuninglevel=disabled Reboot the server http://support.microsoft.com/kb/951037 The server users Broadcom nics for iSCSI.
May 27th, 2011 11:53am

Hi, Have you managed to solve the issue you are having? I am getting exactly the same issue on all of our 2008R2 /Win 7 vm's I do not get any problems with 2003 servers though. I thought it was to do with the Virtual NIC that it was assigned - all of the 2003 boxes have "Flexible" as the network adapter type in vSphere - the 2008 R2 and Win 7 vm's have E1000 as their network adapter? Simon
Free Windows Admin Tool Kit Click here and download it now
July 5th, 2011 4:19pm

Same problem here on W7 and W2k8R2... Is there a fix out? Thanks Matias
July 13th, 2011 4:55am

I have the same thing happening where the copy takes all RAM and then stalls out. Furthermore, if it is running on a server that is also running a process that takes a lot of RAM, such as SQL server, it will choke SQL out and cause it to stop responding as well. Here is the hitch: I can watch the RAM usage grow to 100% in the task manager if I copy the file to a clustered drive from Windows 2003 to Windows 2008 R2. However, if I copy the file to a non-clustered drive (i.e. C:\), the cache usage goes up, but the RAM usage DOES NOT GROW and it behaves properly. If I copy the same file from a Microsoft Server 2008 R2 instead of 2003, the RAM usage will grow no matter which drive I copy to. So if I copy a file to my C:\ drive, why does the RAM meter depleting when copying from Windows 2008 R2, yet it does NOT deplete when copying from Windows 2003? Yet the cache shows the same behavior in both cases. Furthermore, once the RAM reaches a certain point, the network traffic will STOP and the RAM will slowly creep back down. Then it will start again. This repeats over and over. If I use XCOPY /J, the cache never rises, and neither does the RAM meter. But this is not an acceptable workaround for two reasons. One, it won't buffer anything. Two, it doesn't solve the problem, as one of my new techs might copy a file to a production system by accident and take it down by accident in the middle of the day. By the way, I modified the DynCache source so it would run on Windows 2008 R2, but it has ZERO EFFECT. The cache will continue to go way past what SetSystemFileCacheSize() is set to during the copy, and release the cache back to the correct size after the file copy has finished. This is becoming a serious problem for our production systems. Time to contact Microsoft.
Free Windows Admin Tool Kit Click here and download it now
July 14th, 2011 5:38pm

This issue is going to cause me some headaches in the next month or two. We currently have a 2008 (non-R2) file cluster which requires the DynCache service to be running due to a few users who occasionally process large datafiles over the network. Without DynCache we start seeing VSS failures due to insufficient memory which causes all sorts of issues including DPM sync failures, etc. Not to mention the server performance grinds to a halt. (two-node cluster w/ 16GB RAM in each node) We are planning to migrate to Storage Server 2008 R2 soon which I'm assuming would be affected by the same issue.
July 15th, 2011 7:05pm

After sending the following to Tiger Li my account seems to bee ok, so I send you an idea to the obove mentioned problem too. In my case I have to copy a directory tree with large amount of files (more than a million small files) from a pc in the net to Server 2008 R2. Transfer becomes slower and slower and after a while complete ram is used and the system crashes. Same happens with bigger files (about 500 with 1-2 gb size). In short: I changed the harddrive in the server from new type 'advanced format' to an older drive with standard format and all problems are gone. May be, that this helps finding a solution Regards
Free Windows Admin Tool Kit Click here and download it now
July 22nd, 2011 8:17pm

hey, I dunno if i understand you scenario, but if i got you right- all you need to do is to configure a new disk in the guest VM as IDE drive and not as SCSI, and it will work perfectly I dunno why, but i'm having the same problem with the disk is configured as scsi @ hyperv vm's. let me know if it worked for you. cheers
August 2nd, 2011 11:07am

I have the same problem on one single server. Just to be sure, symptoms are unlimited cache growth, even when reading large files. After the file is read, cache memory is freed, but if it's size is more than available memory... everything hangs and all I can do is to kill the process (takes few minutes to even show taskmgr screen, everything is madly swapped to pagefile). Since the server's role is pretty unimportant, I can afford to test some unofficial programs and fixes. This is what worked for me - http://www.uwe-sieber.de/ntcacheset_e.html I didn't use old dyncache but suppose this program does the same. Takes two command-line parameters, min and max values of permitted cache size. At least after running it with 1 2048 parameters I do not see something like 'winlogon could not show you ctrl+alt+del window. you can try pressing reset to reboot the computer' anymore, and there are always 'free' memory in task manager's performance. System - Win2008R2 SP1
Free Windows Admin Tool Kit Click here and download it now
August 9th, 2011 12:39am

Just reset from SCSI to IDE on the VM settings and it made no difference. Increased memory for the VM from 2048MB -> 6144MB. This allows bigger files to be copied but after the 6GB is filled, the file copy slows to a crawl. So, copying any flie that is larger than the VM's memory allocation is impossible. Can anyone from MSFT comment and provide a fix. This is clearly a repeatable problem and seems to be affecting many people.
August 13th, 2011 1:06pm

After all this time, I finally "solved" the problem I was having. It seems like in this posting there are possibly multiple issues being discussed, because it's not clear that everyone's situation is that same as mine, so keep that in mind when you read what I changed to make the problem go away in our environment. In summary, it all came down to the write-caching policy on the RAID controller. We are dealing with Dell servers their controllers, and I have reproduced the issue on both embedded/internal RAID controllers as well as their external RAID controllers for direct-attached storage arrays. When the RAID5 virtual disk on the controller is set to write-through, the copy performance issue exists. When the virtual disk is switched to write-back, the problem disappears. By default we always use write-back caching, but when the RAID battery fails (or while the battery is charging), the controller automatically switches the virtual disk back to write-through until the battery is replaced or charged. You can select to "force write-back" when the battery is dead, but the consequence is possible data loss if the server were to crash or lose power during a write operation. Interestingly, the performance difference of write-through vs write-back caching often seems negligible. However, for certain operations, including large file copies across a network, there are clearly issues. Interestingly, I have not been able to reproduce the problem when copying files on the same server from one array to another. It only seems to exist when copying across a network where the destination drive has a write-through caching policy. I hope this info helps some other people with similar problems. I can't believe that after all this time and so much testing that it all came down to a single setting. It's disappointing to discover that the write-caching policy can have such a large impact for some operations and a nearly non-existent impact for others.
Free Windows Admin Tool Kit Click here and download it now
August 18th, 2011 1:29pm

After all this time, I finally "solved" the problem I was having. It seems like in this posting there are possibly multiple issues being discussed, because it's not clear that everyone's situation is that same as mine, so keep that in mind when you read what I changed to make the problem go away in our environment. In summary, it all came down to the write-caching policy on the RAID controller. We are dealing with Dell servers and Dell controllers, and I have reproduced the issue on both embedded/internal RAID controllers as well as their external RAID controllers for direct-attached storage arrays. When the RAID5 virtual disk on the controller is set to write-through, the copy performance issue exists. When the virtual disk is switched to write-back, the problem disappears. By default we always use write-back caching, but when the RAID battery fails (or while the battery is charging), the controller automatically switches the virtual disk back to write-through until the battery is replaced or charged. You can select to "force write-back" when the battery is dead, but the consequence is possible data loss if the server were to crash or lose power during a write operation. Interestingly, the performance difference of write-through vs write-back caching often seems negligible. However, for certain operations, including large file copies across a network, there are clearly issues. Interestingly, I have not been able to reproduce the problem when copying files on the same server from one array to another. It only seems to exist when copying across a network where the destination drive has a write-through caching policy. I hope this info helps some other people with similar problems. I can't believe that after all this time and so much testing that it all came down to a single setting. It's disappointing to discover that the write-caching policy can have such a large impact for some operations and a nearly non-existent impact for others.
August 18th, 2011 8:29pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics