How to trouble shoot the SharePoint 2007 Farm Performance Issue?
Issue: It is taking more than 250ms to respond also sometime it throws DB connectivity Error, throws low memory alerts every now and then. Environment Details: In our SharePoint 2007 Farm, basically it a publishing framework with 4 WFEs and an application server (just Excel service runs there). A WFE is used as an Index & Query Server. We are facing a performance issue since a month, it is taking more than 250ms to respond also sometime it throws DB connectivity Error. It has only one web-application with 18 Content DBs with 10 Site collection including the root. Server Details (All the servers are having the same configuration) 4 Cores - 2.27GZ RAM 8GB Search: Full crawl takes - 90hrs (weekly) Incremental crawl takes - 6hrs (daily) It has been stopped Temporarily. Performance Analysis(per server) Number of Hits/ Day 1 Lac - 200 hits 1 Lac - 401 hits .5Lac - others Memory Usage 2GB - 2.5GB Peak Memory 5GB - 6.5GB Page Faults 35K - 37K VM size 2GB - 2.2GB I/O Read Bytes 2GB I/O Write Bytes 8GB SQL Analysis: I could find huge number of "ASYNC_NETWORK_IO" waiting tasks, it should also cause the performance. EventViewer Analysis: Event ID : 1013 Description : IIS waits for the pool to do a graceful restart that is complete all the pending requests on the pool, shut it down and start again. But if the pending queue is long or the process has hung, the graceful restart will fail and IIS will just do a forced recycle on the pool. Event ID : 1039 Description : A specific w3wp.exe process caused an error. Either it could be because of the connectivity issue or the process itself crashed. Event ID : 1039, Event code 3005 Description : Could have been an issue with SharePoint DB Event ID : 2003 Description : It has taken too long to refresh the W3SVC counters, the stale counters are being used instead. Event ID : 4830 Description : Low virtual memory. Event ID : 4830 Description : High memory usage in the W3wp.exe file on a computer that is running Windows Server 2003 Event ID : 9511 Description : An unexpected SQL Server database error occurred while the Windows SharePoint attempted to communicate with the database Event ID : 10031 Description : Database out of Space, DB Admin Failure Event ID : 10034 Description : Database server is not accessible Event ID : 10036 Description : Problem's origin in NIC malfunctioning. I could see the below list of errors in the ULS log many times. 1. Publishing: Content deployment job failed. Error: 'Microsoft.SharePoint.SPException: Cannot complete this action. Please try again. ---> System.Runtime.InteropServices.COMException (0x80004005): Cannot complete this action. 2. While initializing navigation, found Page placeholder but object was not found at: /GLOBAL/COMPANY/sompage.aspx 3. List item query elapsed time: 5108 milliseconds, Additional data (if available): Query HRESULT: 0 List internal name, flags, and URL: {9EA036BC-BD0D-4D5D-8EDF-4B2439933179}, flags=0x000000022cdc148c, 4. Error: Failure in loading assembly: MyNamespace.SharePoint.myClass, Version=1.0.0.0, 5. Publishing: Content deployment job failed. Error: 'System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 6. Job 'Distribution List Import Job' failed. It will be re-tried again in 60 second(s). Reason: Failed to obtain crawl status. Techinal Details: Microsoft.Office.Server.UserProfiles.UserProfileException: Failed to obtain crawl status. ---> System.Net.WebException: The underlying connection was closed: 7. (#3: Cannot open file "Resources.en-US.resx" for reading.) 8. Exception caught in Search Admin web-service proxy (client). System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. 9. Trying to store a checked out item (/SITES/Teamsite/PAGES/Default.ASPX) in the object cache. This may be because the checked out user is accessing the page, or it could be that the SharePoint system account has the item checked out. 10. # 20 015: "Can RatingResources.de-DE.resx" not open: No such file or folder with that name does not exist. There is a workflow which triggers whenever a new post added into the publishing frame work. Farm goes down once the workflow starts. Any suggestion would be appreciated...!Crazy Nick | MCTS | India
April 8th, 2011 7:31am

From the intial errors, There ara slew of IO errors. I would suggest going step by step. Start with the network to see if you are seeing any connectivty issues. You could sue Fiddler or Netmon to capture connectivity statistics. If your backend databases are on SAN, I would suggest checking I/O throughput to the disks. >Event ID : 10036 Description : Problem's origin in NIC malfunctioning. Checkk the NIC's on the SQL server to make sure they are functioning and that there is no hardware failure.
Free Windows Admin Tool Kit Click here and download it now
April 8th, 2011 9:34am

I've taken the memory dump using DebugDiag however seems to be no memory leak issue also I could not see any major issues apart from few "This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required." message for the custom DLLs. Even, I've gone through the ULS logs and there is nothing wrong the SPSite/SPWeb objects. Still checking on the IIS Log to find whether it is due to a page or somewhere else... My main concern is that CPU ticks always more than 80% and not sure how to go inside the w3wp process to find which is the sub-process consumes lots of CPU. I could also see the below message: Thread 25 - System ID 5240 Entry point 0x0547c1c0 Create time 11.04.2011 01:05:30 Time spent in user mode 0 Days 00:22:19.796 Time spent in kernel mode 0 Days 00:02:44.468 This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required. Function ntdll!ZwWaitForMultipleObjects+a kernel32!ReleaseSemaphore+6b mscorwks!GetCLRFunction+81bd mscorwks!GetCLRFunction+d8a9 mscorwks!CreateApplicationContext+222e9 mscorwks!CreateAssemblyNameObject+39edc mscorwks!CompareAssemblyIdentity+9caf2 mscorwks!CorLaunchApplication+1c0e7 mscorwks!CertCreateAuthenticodeLicense+2289a8 mscorwks!TranslateSecurityAttributes+3d66 Microsoft_SharePoint_Publishing!Microsoft.SharePoint.Publishing.BlobCache.RewriteUrl(System.Object, System.EventArgs)+a6e System_Web_ni!System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()+50 System_Web_ni!System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)+ab System_Web_ni!System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)+1a5 System_Web_ni!System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)+d3 System_Web_ni!System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)+1c4 System_Web_ni!System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)+7c Crazy Nick | MCTS | India
April 11th, 2011 6:37am

I've taken the memory dump using DebugDiag however seems to be no memory leak issue also I could not see any major issues apart from few "This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required." message for the custom DLLs. Even, I've gone through the ULS logs and there is nothing wrong the SPSite/SPWeb objects. Still checking on the IIS Log to find whether it is due to a page or somewhere else... My main concern is that CPU ticks always more than 80% and not sure how to go inside the w3wp process to find which is the sub-process consumes lots of CPU. I could also see the below message: Thread 25 - System ID 5240 Entry point 0x0547c1c0 Create time 11.04.2011 01:05:30 Time spent in user mode 0 Days 00:22:19.796 Time spent in kernel mode 0 Days 00:02:44.468 This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required. Function ntdll!ZwWaitForMultipleObjects+a kernel32!ReleaseSemaphore+6b mscorwks!GetCLRFunction+81bd mscorwks!GetCLRFunction+d8a9 mscorwks!CreateApplicationContext+222e9 mscorwks!CreateAssemblyNameObject+39edc mscorwks!CompareAssemblyIdentity+9caf2 mscorwks!CorLaunchApplication+1c0e7 mscorwks!CertCreateAuthenticodeLicense+2289a8 mscorwks!TranslateSecurityAttributes+3d66 Microsoft_SharePoint_Publishing!Microsoft.SharePoint.Publishing.BlobCache.RewriteUrl(System.Object, System.EventArgs)+a6e System_Web_ni!System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()+50 System_Web_ni!System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)+ab System_Web_ni!System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)+1a5 System_Web_ni!System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)+d3 System_Web_ni!System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)+1c4 System_Web_ni!System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)+7c Update: I have done the CPU analysis and found the below: The following threads in 9256_110411_162412.dmp are calling an ISAPI Extension OWSSVR ( 45 47 59 ) 3,70% of threads blocked Crazy Nick | MCTS | India
Free Windows Admin Tool Kit Click here and download it now
April 11th, 2011 6:37am

Hi YajivAN, Please take a look at these 2 threads. http://social.technet.microsoft.com/Forums/en-US/sharepointadmin/thread/607a3950-0b8a-42ed-98cf-cf47077862ce/ http://social.technet.microsoft.com/forums/en-US/sharepointadmin/thread/866f931e-e964-4f95-a748-c11ccc9f9b77/ Best regards, Emir
April 13th, 2011 10:36am

Thanks Emir. I could not understand how the above mentioned URLs help me on this issue. Below are my latest findigs: Number of Hits during the peak hour - 210625 Successful Hits during the peak hour - 120000 Unauthenticated Hits during the peak hour - 100000 Server has sent - 2.44858GB of data to clients during the peak hour Server has received - 0.82138GB of data from clients during the peak hour. Per day transaction - 23.2919GB (both sent & received) There are hundreds of pages takes more than 10sec to load during the peak hour. I do not know how to conclude that whether the number of hits or the Disk/Network I/O causes the issue... Crazy Nick | MCTS | India
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 11:06am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics