SQL Server - Timed Out Exception.
We are facing "SQL Server Timed Out" exception everyday exactly at 10:00PM. SharePoint Topology Information: 1. 3 WFEs 2. 1 Admin Server 3. 1 Index Server 4. 1 Query Server 5. 3 SQL Clusters (3 SQL Servers [active] + 3 failover SQL Servers [passive]) Configuration Information: 1. 3 WFEs are configured with the common VIP 2. Network Teaming is configured in all the 3 SQL servers. Application Information: 1. 3 Web-applications (DB Server - 01) 2. CA (DB Server - 03) 3. SSP (DB Server - 03) 4. Mysite (DB Server - 02) Scheduled Job at 10:00PM 1. SQL Incremental Transaction Backup is configured hourly basis 2. No Other MOSS specific timer jobs are scheduled during those hours. (except Profile Sync, SSP Sync and few other simple timer jobs - running hourly basis) 3. Incremental crawling is scheduled to run at every 30mins. Issue: Monitoring application (Tivoli) triggers an issue at exactly 10:00PM everyday, stating that "SQL SERVER is failed to respond" or "Timed out Exception" or "Connection failure", the first day it is saying "SQL Server-01" is not available however second day "SQL Server-02" is not available and somedays it is mixed. DBA: SQL Team is saying that all the SQL Servers are up and running during those hours. N/W: NIC is not updated with the latest software however the probablity of caUsing the issue is less / nothing. I do not know whether it is an issue with MOSS or a DB issue or a Network issue or something else. Please help me with the various parameters that has to be investigated on this and share me if you have such an experience before. Thanks in Advance.
February 19th, 2010 9:21pm
Hi Nick,How many servers are performing Crawls of your content and what time and how often do you perform a Full Crawl? In my experience it is rarely the SharePoint Farm and usually SQL. However, it may be inderectly the SharePoint farm dependent on how many servers are crawling content at the same time...How big are your Content dbs, does the SQL Cluster just handle SharePoint ot are there other applications? What is your SQL Disk IO during thsi period, Queue wait Time etc... Is the SQL Team performing Snaps? -IvanIvan Sanders My LinkedIn Profile, My Blog, @iasanders.
February 20th, 2010 6:40am
Hello Ivan,How many servers are performing Crawls of your content?all the WFEs.what time and how often do you perform a Full Crawl? Full Crawling is scheduled on daily basis (but not at 10:00PM).How big are your Content dbs?they are huge DBs.Does the SQL Cluster just handle SharePoint ot are there other applications? Only SharePoint applicaions.Is the SQL Team performing Snaps? No.What is your SQL Disk IO during thsi period, Queue wait Time etc... I have configured the Performance counters to monitor the CPU, Memory, Disk & Network and it doesnot seems to be an issue due to either the resource unavailabilty or higher usage.... so, what else could be the reason? any help would be appreciated..-Nick
February 25th, 2010 10:21pm
I found that the Error event ID is either Event 5586 or 3355, also could see few other DB related error event ids (3351 & 3760) reported at different times.so, what could be the reason? any help would be appreciated..
March 5th, 2010 2:53am
Hi Nick, http://technet.microsoft.com/en-us/library/cc748829.aspx .Also, Bill Baer has an ecellent article http://blogs.technet.com/wbaer/archive/2009/10/06/intermittent-database-server-connectivity-and-microsoft-sharepoint-products-and-technologies.aspx Event ID: 5586, Rule ID: Microsoft.Windows.SharePoint.Services.3.0.Unknown_SQL_Exceptions, Description: This rule triggers an alert whenever any unknown SQL exceptions occur. The details of the exception are displayed in the alert. Event Source: Windows SharePoint Services 3.0.. Event ID: 3355, Rule ID: Microsoft.Windows.SharePoint.Services.3.0.Cannot_connect_to_SQL_Server, Description: This alert indicates that Windows SharePoint Services could not connect to the SQL Server database. Alert Type: Error, Event Source: Windows SharePoint Services 3.0.. After you install Windows Server 2003 Service Pack 2 (SP2) or Windows Server 2003 Scalable Networking Pack (SNP) on a computer that has a TCP/IP Offload-enabled network adapter, you may experience many network-related problems. Consider disabling SNP features on front-end Web and application servers. If you are using SQL Server Connection Aliases to compliment Microsoft SQL Server Database Mirroring or optionally to make the database instance portable in support of migrations or operational functions, you should verify that Dynamically Determine Ports is not enabled in the SQL Server Client Network Utility. When no port number is stored for the alias entry the DBNETLIB attempts to contact the server through a known UDP port to obtain the correct connection information to establish the connection, under certain scenarios this can result in losses of connectivity. I am assuming you have a large SAN supporting the SQL Clusters? and that the SQL Clusters are x64, 32-64gb ram, 4-8 procs? SQL08SP1? the WFEs each have 8-16gb ram 2-4proc, and the App servers have 16-32GB ram and 2-4proc... gigabit backbone seperate backup network. Do you guys boot from SAN? TEC can be fun, anyhow the best advice is to look in the ULS logs for 10pm on all servers along with the event logs and trace the issue. I can only in general tell you what I have seen in the past.. Cheers, -Ivan Ivan Sanders My LinkedIn Profile, My Blog, @iasanders.
April 3rd, 2010 11:31am