File metadata lookups/filehandle creations fail over SMB during periods of heavy IO on Windows 2008 Standard using SMB 2 (Network Steve Forum)

File metadata lookups/filehandle creations fail over SMB during periods of heavy IO on Windows 2008 Standard using SMB 2

I am having a very hard time utilizing SMB 2.0 in a IO heavy distributed computing environment. We have a grid with 250 CPU cores and is composed with a following structure: Shared Resource Server File server and Database server Windows 2008 SP2 Standard x64 SQL Server 2005 x64 SMB 2.0 24GB RAM NO ANTIVIRUS File Server volume on 8 spindle SAS disks in RAID array hosted on a P800 RAID controller with 512MB cache SQL Server volume and OS volume on a separate controller Dual Xeon X5500 quad core Special Registry Tweaks: HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\TreatHostAsStableStorage = 1 HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\AsynchronousCredits = 1536 HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MaxFreeConnections = 4096 HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MinFreeConnections = 100 HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MaxRawWorkItems = 512 HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MaxWorkItems = 8192 HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MaxMpxCt = 2048 HKLM\System\CurrentControlSet\Control\Session Manager\Executive\AdditionalCriticalWorkThreads = 16 Compute Nodes: Windows 2008 NO ANTIVIRUS 24GB RAM Special Registry Tweaks: HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheLifetime = 0 HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileInfoCacheLifetime = 0 HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileNotFoundCacheLifetime = 0 HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\MaxCmds = 4096 HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\SessTimeout = 300 HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\TcpTimedWaitDelay = 30 Issues encountered under load: Trying to get the size of a file Code Sample: FileSize = GetFileSize(hFileHandle, 0) Errors Received only under load: System Error 0x40 (64) ERROR_NETNAME_DELETED Trying to create file handle Code Sample: hFileHandle = CreateFile(mFileName, GENERIC_READ, FILE_SHARE_READ, ByVal 0&, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, ByVal 0&) Errors Received only under load: System Error 0x2 (161) ERROR_FILE_NOT_FOUND System Error 0x3 (3) ERROR_PATH_NOT_FOUND System Error 0xA1 (161) ERROR_BAD_PATHNAME Reading Data from file: Code Sample: ReadFileResult = ReadFile(iFileHandle, bBytes(1), FileSize, Ret, ByVal 0&) Errors received only under load: System Error 0x40 (64) ERROR_NETNAME_DELETED These issues happen where there appears to be a lot of IO requests involving hundreds of small files (typically ranging from 2-300KB). These issues do not occur when involving a few large files. I have been able to witness traffic around 3Gbps without issue and other times encounter the above issues when only 300 - 400 Mbps of traffic are occurring thus reaffirming it’s the number of individual IO requests that seem to contribute to the issue or maybe better stated as file meta data contention. One of the optimizations we made was to use the absolute hostname (\\servername\path\to\share\) rather than use the DFS namespace path (\\some.domain\dfs\path\to\share). We found this reduced the number of failures likely due to the overhead in DFS referral lookups. Articles referenced in attempt to research this issue More than 64 long term SMB requests http://support.microsoft.com/kb/938475 Seems unlikely since I never have any indication that "DriveLetter:\ is not accessible. Insufficient system resources exist to complete the API." UNC Content Is Under High Load http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/77a9fbf8-e889-414c-a9f9-f3fa72e3b593.mspx?mfr=true This held the most promise of any article and was used in part to help determine the registry settings that I set above Windows Explorer and SMB http://blogs.technet.com/b/askperf/archive/2007/09/21/windows-explorer-and-smb-traffic.aspx Most of these suggestions seem to be optimizing SMB 1 for Windows Explorer related traffic. We use WinAPI calls, not explorer shell calls Issues working with files over the network on a Windows 2000 or Windows 2003 computer http://support.microsoft.com/kb/923360 Some of these errors were encountered (e.g. ERROR_NETNAME_DELETED and ERROR_BAD_NETPATH (sounds similar)). Also we have had "Disk or network error." when working with MS ACCESS databases during these conditions Registry changes above were implemented to help address the problems but the KB suggestions are not applicable as we don't run an antivirus. Issues with Remove-Item $folder -recurse -force http://pauerschell.blogspot.com/2010/05/problem-with-remove-item-folder-recurse.html We have been able to replicate the stated issues as we regularly produce folders with 100,000 items 4 levels deep This is not performed over SMB but occurs locally as part of a clean up process. Its sort of tangential to the issue at hand but interesting neverless SMB 2.0 protocol related perf counters return extremely high values on Windows 2008 http://support.microsoft.com/kb/969670 Had to apply this hotfix to get realistic numbers from perfmon on the File server I have tried monitoring so many perfmon counters related to SMB on the filesystem: CurrentClients Queue Length Read Operations WorkItemShortage Disk queue lengths After hotpatching the system I wasn't able to find any correlation between counters and when the issues occurred. I haven't tried monitoring the compute nodes because that is a LOT of counters to monitor over ~24 machines x N clusters especially when you are not sure when the issue will occur. Also trying to do packet captures over the period of hours is just insane with 3+Gbps of traffic without causing interference or setting up complicated port mirroring. At the end of the day, and working on these issue for months, I am frustrated. I am not sure where this issue is occurring or even what metric I can monitor to look for a "threshold". Is it in the SMB client or in the server? I don't know what counters I can monitor how we can effectively address this issue without a blanket statement of "reduce IO" or creating a complicated scheme of sharding the data across multiple file systems which isn't possible with our current architecture. I noticed that Microsoft really has very little technical documentation regarding filesystems in High Performance Computing (distributed computing) clusters. We are in the process of trying to introduce optimizations, caching, etc but these are multi-year endeavors. In the mean time, we are looking to extend our runway on existing technologies. Gaining some guidance into this issue will be much appreciated as we understand that the way we use SMB is nothing like how a simple office uses SMB. Is there such a thing as a SMB/UNC compliant protocol that is much more tolerance to the load conditions we present? Have other users ran into these issues? How have you worked around them?

December 2nd, 2010 12:19pm

Well, formatting that by hand seems to have been a heck of lot more successful than using the rich text editor.

Free Windows Admin Tool Kit Click here and download it now

December 2nd, 2010 12:56pm

Here is something else that is interesting. Currently the Server -> Files Open counter is hovering around 4500 files yet only roughly 73 are listed when you perform a net files command. Server -> Files Opened Total is hovering around 4 billion files. Using sysinternals handle.exe, I can actually show what those 4500 files are but I can't through any of the Windows native utilities. During periods of high load, we also can get processes on our calculation nodes which make IO requests that never return. This causes the process to hang and actually not respond to a kill process command (e.g. like TerminateProcess) because there is an IO request that got dropped somewhere. In fact, nothing can kill the process (not even pskill). The only way to terminate the process is to reboot the calculation node.

December 9th, 2010 12:22pm

Not a single thought? Follow up question? Is this the right forum? Is Microsoft even the right platform that we should be building high performance computing applications on?

Free Windows Admin Tool Kit Click here and download it now

December 14th, 2010 6:45pm

Hi, Any reason for disabling the following caches? HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheLifetime = 0 HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileInfoCacheLifetime = 0 HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileNotFoundCacheLifetime = 0 http://technet.microsoft.com/en-us/library/ff686200(WS.10).aspx p.p1 {margin: 0.0px 0.0px 15.0px 0.0px; font: 11.0px Verdana} Changing these cache timeout values can have significant performance implications to many network file scenarios. As each of these caches is designed to reduce the number of SMB server requests, they are important not only in client response time evaluation, but also in overall SMB server scalability and performance. Looks like you've done the oposite of the recommended to get more performance. Check info related to RSS as well: http://support.microsoft.com/kb/951037 http://www.microsoft.com/whdc/device/network/NDIS_RSS.mspx http://technet.microsoft.com/en-us/network/dd277646.aspx Windows 2008R2 presents different counters now to give hints about CPU usage: http://technet.microsoft.com/en-us/library/gg162703(WS.10).aspx

December 18th, 2010 11:36am

Hi Sergio Actually there is a reason. The Caches are ideal in remote office type situations or other situations where files have a single writer or are written to infrequently. In this distributed computing scenario you have multiple readers and writers. Here is the situation that caused us to investigate and eventually conclude that caching is actually hurting us in this situation: Job comes into the job management box. Files necessary for job processing are created on a network share on high performance file server. An SQL Server database is updated providing the signal to the compute nodes that the job is now ready to be processed. Compute nodes poll the SQL Database and realize that jobs are ready for processing and reach out to the file share where they expect to find the newly created files but it comes back as FileNotFound as the cache's view of the remote share is out of date. Disabling the caches resolves this problem. I am not the only one that has come to this conclusion: http://social.msdn.microsoft.com/Forums/en/os_fileservices/thread/832d395b-6e6f-4658-8dbb-120138a4cd7c Microsoft is also quoted in regards to utilizing metadata caches "This has value in a scenario such as a client browsing a network file directory while connected via a low bandwidth or high latency connection." This is neither high latency (all nodes are on a single high performance managed switch) or low bandwidth (all connections are 1Gbps with exception of the shared resource server which is 10Gbps). "Applications which require a high level of file information consistency across clients which may utilize creation or changing of a file as a notification mechanism to other nodes may encounter delays or consistency issues with these default values. " http://technet.microsoft.com/en-us/library/ff686200(WS.10).aspx I am going to investigate more into the Receive Side Scaling that you posted. After doing a netstat -t, all my traffic on the Shared Resource Server is reported as InHost. I have developed a workload that can replicate this failure mode pretty easily so it should be clear if offloading provides any assistance.

Free Windows Admin Tool Kit Click here and download it now

December 21st, 2010 1:09pm

Sergio I took a look at our interface configuration for our NC522SFP adapter. It appears that we already have RSS configured on it. e:\>netsh interface tcp show global Querying active state... TCP Global Parameters ---------------------------------------------- Receive-Side Scaling State : enabled Chimney Offload State : automatic NetDMA State : enabled Direct Cache Acess (DCA) : disabled Receive Window Auto-Tuning Level : normal Add-On Congestion Control Provider : ctcp ECN Capability : disabled RFC 1323 Timestamps : disabled I am currently reading through the below linked document now but at a cursory glance it appears that we are already running in an optimized configuration. http://download.microsoft.com/download/8/E/D/8EDE21BC-0E3B-4E14-AAEA-9E2B03917A09/HSN_Deployment_Guide.doc

December 21st, 2010 2:57pm

Here is more information regarding our network interface: e:\>netsh interface tcp show chimneystats 15 TCP Chimney Statistics for - Local Area Connection 5 --------------------------------------------------------- Interface Index : 15 Supports IPv4 connection offload : No Supports IPv6 connection offload : No Attempted to offload connections : -n/a- Some TCP settings denied by the NIC : -n/a- NIC denied connection offload request : -n/a- Offload capacity advertized by the NIC : 0 connections Offload Capacity Observed by the system : 0 connections Number of currently offloaded connections : 0 Successive offload attempt failures : 0 Reason why last attempt to offload failed : -n/a- In other news: No fixed width font when doing code snippets on social.technet?

Free Windows Admin Tool Kit Click here and download it now

December 21st, 2010 4:54pm

May not be related, but we've seen all kinds of weirdness with Broadcom nic's using TCPChimney and RSS Offload. Disabling them will put more load on the CPU, but we've seen a wide variety of network issues with them enabled.Jeff Graves, ORCS Web, Inc.

March 10th, 2011 10:36am

This topic is archived. No further replies will be accepted.