How to trouble shoot the SharePoint 2007 Farm Performance Issue?
Issue:
It is taking more than 250ms to respond also sometime it throws DB connectivity Error, throws low memory alerts every now and then.
Environment Details:
In our SharePoint 2007 Farm, basically it a publishing framework with 4 WFEs and an application server (just Excel service runs there). A WFE is used as an Index & Query Server. We are facing a performance issue since a month, it is taking
more than 250ms to respond also sometime it throws DB connectivity Error.
It has only one web-application with 18 Content DBs with 10 Site collection including the root.
Server Details (All the servers are having the same configuration)
4 Cores - 2.27GZ
RAM 8GB
Search:
Full crawl takes - 90hrs (weekly)
Incremental crawl takes - 6hrs (daily)
It has been stopped Temporarily.
Performance Analysis(per server)
Number of Hits/ Day
1 Lac - 200 hits
1 Lac - 401 hits
.5Lac - others
Memory Usage
2GB - 2.5GB
Peak Memory
5GB - 6.5GB
Page Faults
35K - 37K
VM size
2GB - 2.2GB
I/O Read Bytes
2GB
I/O Write Bytes
8GB
SQL Analysis:
I could find huge number of "ASYNC_NETWORK_IO" waiting tasks, it should also cause the performance.
EventViewer Analysis:
Event ID : 1013
Description : IIS waits for the pool to do a graceful restart that is complete all the pending requests on the pool, shut it down and start again. But if the pending queue is long or the process has hung, the graceful restart will fail and IIS will just do
a forced recycle on the pool.
Event ID : 1039
Description : A specific w3wp.exe process caused an error. Either it could be because of the connectivity issue or the process itself crashed.
Event ID : 1039, Event code 3005
Description : Could have been an issue with SharePoint DB
Event ID : 2003
Description : It has taken too long to refresh the W3SVC counters, the stale counters are being used instead.
Event ID : 4830
Description : Low virtual memory.
Event ID : 4830
Description : High memory usage in the W3wp.exe file on a computer that is running Windows Server 2003
Event ID : 9511
Description : An unexpected SQL Server database error occurred while the Windows SharePoint attempted to communicate with the database
Event ID : 10031
Description : Database out of Space, DB Admin Failure
Event ID : 10034
Description : Database server is not accessible
Event ID : 10036
Description : Problem's origin in NIC malfunctioning.
I could see the below list of errors in the ULS log many times.
1. Publishing: Content deployment job failed. Error: 'Microsoft.SharePoint.SPException: Cannot complete this action. Please try again. ---> System.Runtime.InteropServices.COMException (0x80004005): Cannot complete this action.
2. While initializing navigation, found Page placeholder but object was not found at: /GLOBAL/COMPANY/sompage.aspx
3. List item query elapsed time: 5108 milliseconds, Additional data (if available): Query HRESULT: 0 List internal name, flags, and URL: {9EA036BC-BD0D-4D5D-8EDF-4B2439933179}, flags=0x000000022cdc148c,
4. Error: Failure in loading assembly: MyNamespace.SharePoint.myClass, Version=1.0.0.0,
5. Publishing: Content deployment job failed. Error: 'System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period
of time, or established connection failed because connected host has failed to respond
6. Job 'Distribution List Import Job' failed. It will be re-tried again in 60 second(s). Reason: Failed to obtain crawl status. Techinal Details: Microsoft.Office.Server.UserProfiles.UserProfileException: Failed to obtain crawl status. ---> System.Net.WebException:
The underlying connection was closed:
7. (#3: Cannot open file "Resources.en-US.resx" for reading.)
8. Exception caught in Search Admin web-service proxy (client). System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send.
9. Trying to store a checked out item (/SITES/Teamsite/PAGES/Default.ASPX) in the object cache. This may be because the checked out user is accessing the page, or it could be that the SharePoint system account has the item checked out.
10. # 20 015: "Can RatingResources.de-DE.resx" not open: No such file or folder with that name does not exist.
There is a workflow which triggers whenever a new post added into the publishing frame work. Farm goes down once the workflow starts.
Any suggestion would be appreciated...!Crazy Nick | MCTS | India
April 8th, 2011 7:31am
From the intial errors, There ara slew of IO errors. I would suggest going step by step.
Start with the network to see if you are seeing any connectivty issues. You could sue Fiddler or Netmon to capture connectivity statistics.
If your backend databases are on SAN, I would suggest checking I/O throughput to the disks.
>Event ID : 10036
Description : Problem's origin in NIC malfunctioning.
Checkk the NIC's on the SQL server to make sure they are functioning and that there is no hardware failure.
Free Windows Admin Tool Kit Click here and download it now
April 8th, 2011 9:34am
I've taken the memory dump using DebugDiag however seems to be no memory leak issue also I could not see any major issues apart from few "This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required."
message for the custom DLLs.
Even, I've gone through the ULS logs and there is nothing wrong the SPSite/SPWeb objects. Still checking on the IIS Log to find whether it is due to a page or somewhere else...
My main concern is that CPU ticks always more than 80% and not sure how to go inside the w3wp process to find which is the sub-process consumes lots of CPU.
I could also see the below message:
Thread 25 - System ID 5240
Entry point 0x0547c1c0
Create time 11.04.2011 01:05:30
Time spent in user mode 0 Days 00:22:19.796
Time spent in kernel mode 0 Days 00:02:44.468
This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.
Function
ntdll!ZwWaitForMultipleObjects+a
kernel32!ReleaseSemaphore+6b
mscorwks!GetCLRFunction+81bd
mscorwks!GetCLRFunction+d8a9
mscorwks!CreateApplicationContext+222e9
mscorwks!CreateAssemblyNameObject+39edc
mscorwks!CompareAssemblyIdentity+9caf2
mscorwks!CorLaunchApplication+1c0e7
mscorwks!CertCreateAuthenticodeLicense+2289a8
mscorwks!TranslateSecurityAttributes+3d66
Microsoft_SharePoint_Publishing!Microsoft.SharePoint.Publishing.BlobCache.RewriteUrl(System.Object, System.EventArgs)+a6e
System_Web_ni!System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()+50
System_Web_ni!System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)+ab
System_Web_ni!System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)+1a5
System_Web_ni!System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)+d3
System_Web_ni!System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)+1c4
System_Web_ni!System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)+7c
Crazy Nick | MCTS | India
April 11th, 2011 6:37am
I've taken the memory dump using DebugDiag however seems to be no memory leak issue also I could not see any major issues apart from few "This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required."
message for the custom DLLs.
Even, I've gone through the ULS logs and there is nothing wrong the SPSite/SPWeb objects. Still checking on the IIS Log to find whether it is due to a page or somewhere else...
My main concern is that CPU ticks always more than 80% and not sure how to go inside the w3wp process to find which is the sub-process consumes lots of CPU.
I could also see the below message:
Thread 25 - System ID 5240
Entry point 0x0547c1c0
Create time 11.04.2011 01:05:30
Time spent in user mode 0 Days 00:22:19.796
Time spent in kernel mode 0 Days 00:02:44.468
This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.
Function
ntdll!ZwWaitForMultipleObjects+a
kernel32!ReleaseSemaphore+6b
mscorwks!GetCLRFunction+81bd
mscorwks!GetCLRFunction+d8a9
mscorwks!CreateApplicationContext+222e9
mscorwks!CreateAssemblyNameObject+39edc
mscorwks!CompareAssemblyIdentity+9caf2
mscorwks!CorLaunchApplication+1c0e7
mscorwks!CertCreateAuthenticodeLicense+2289a8
mscorwks!TranslateSecurityAttributes+3d66
Microsoft_SharePoint_Publishing!Microsoft.SharePoint.Publishing.BlobCache.RewriteUrl(System.Object, System.EventArgs)+a6e
System_Web_ni!System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()+50
System_Web_ni!System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)+ab
System_Web_ni!System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)+1a5
System_Web_ni!System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)+d3
System_Web_ni!System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)+1c4
System_Web_ni!System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)+7c
Update:
I have done the CPU analysis and found the below:
The following threads in 9256_110411_162412.dmp are calling an ISAPI Extension OWSSVR
( 45
47 59 )
3,70% of threads blocked
Crazy Nick | MCTS | India
Free Windows Admin Tool Kit Click here and download it now
April 11th, 2011 6:37am
Hi YajivAN,
Please take a look at these 2 threads.
http://social.technet.microsoft.com/Forums/en-US/sharepointadmin/thread/607a3950-0b8a-42ed-98cf-cf47077862ce/
http://social.technet.microsoft.com/forums/en-US/sharepointadmin/thread/866f931e-e964-4f95-a748-c11ccc9f9b77/
Best regards,
Emir
April 13th, 2011 10:36am
Thanks Emir.
I could not understand how the above mentioned URLs help me on this issue.
Below are my latest findigs:
Number of Hits during the peak hour - 210625
Successful Hits during the peak hour - 120000
Unauthenticated Hits during the peak hour - 100000
Server has sent - 2.44858GB of data to clients during the peak hour
Server has received - 0.82138GB of data from clients during the peak hour.
Per day transaction - 23.2919GB (both sent & received)
There are hundreds of pages takes more than 10sec to load during the peak hour.
I do not know how to conclude that whether the number of hits or the Disk/Network I/O causes the issue...
Crazy Nick | MCTS | India
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 11:06am