Sharepoint 2007 Farm email alert failure
We have a strange problem with our Sharepoint 2007 farm(Servers are 1 Sql Server 2008, 1 WFE, 1 Central Admin, 1 Index, 1 Qry, all Windows Server 2008 64-bit). I've searched for 4 days for a solution, without any luck. (I'm sure it doesn't help that I'm new to the care and feeding of Sharepoint. I was initially reluctant to take on the responsibility because of my inexperience with the product and our consultants recommendation for a dedicated resource, but "ya gotta do what ya gotta do"...) The initial problem reported on 4/5/2011 was that Sharepoint had stopped sending email alerts when items were updated. This has been a recurring issue, although the problem has not always been within the Sharepoint environment boundaries(the email server in one notable case was the problem), but many times the problem has originated within Sharepoint. As I checked into it, the problem this time appeared to be due to the Timer Job not running. I checked the timer service, and for good measure restarted it on all the SP servers in the farm, since this has been reported as a solution. Sharepoint still no happy, so I used the solution listed on Mel Lota's blog to reload the alert templates(this had worked previously to restart email alerts for us.) No luck. In the past I've also had luck kickstarting email alerts by quiescing the farm and resetting. But the request to quiesce the farm failed, and threw this message back: "An unhandled exception occurred in the user interface.Exception Information: The 'fld' start tag on line 1 does not match the end tag of 'sFld'. Line 1, position 29475." This didn't solve the problem either and as I processed through the overall symptoms and the messages in the logs (5utx,8xqx errors were reported) the problem seemed to match the symptoms that were reported as resolved by flushing and reloading the configuration cache. So after office hours on 4/5, I followed the published procedure to reload the configuration cache, which had disastrous results. The process of reloading the config cache failed to complete and at this point our Sharepoint site and the Central Admin site would only respond with an otherwise blank page containing "An unexpected error has occurred." OVER 9000! level fail. (Probably this should have been the thread title, but I digress.) I managed to restore the site's functionality by copying a backup of the cache directory I had on hand to all the SP servers and replacing the incompletely reloaded cache(s?) contents, which at least allowed me to get the site up for the next business day. Still without email alerts and I now noticed that searching for anything returned "The search request was unable to connect to the Search Service." An attempt to Quiesce the farm returns the same error message as before. Digging deeper, I found that the SP servers in the farm began logging the "An unhandled exception occurred in the user interface.Exception Information: The 'fld' start tag on line 1 does not match the end tag of 'sFld'. Line 1, position 29475." message into the Windows Application Log after business hours on 3/30/2011, 6 days before the farm failed. This message appears to be an inability of the XML parser to successfully match start and end tags, but I can't tell which of the 500+ config objects it's complaining about, and even if I could, I'm not sure how I could repair the problem. Of course several timer jobs have failed(no surprise), or in some cases have become frozen at the "Initializing" status. Has anyone seen anything like this? I've used Bing and G00gle(gasp!) to search for email alert failure topics, config cache refresh topics. the specific tag mismatch error, general tag mismatch errors, timer job failures, all the error messages I've seen relating to this, specific and non-specific. (It's eaten up four days of regular work time along with my other duties, and hours of outside of work time, to the point where they're announcing layoffs in three days time, and part of me is hoping I'm on that list.) And I suppose more to the point, is there anyway to repair it? I have regular daily backups of the farm databases from a scheduled task that runs the stsadm backup command, and as another hedge, Sql Server backups of the farm databases, but my understanding from what I've read is that I can't restore the config database, only the content databases. I've considered running the repair option, but since running the configuration wizard fails with the same "...The 'fld' start tag on line 1 does not match the end tag ..." error, I'm not very confident that this will help, and may again make my problems worse. Which is an outcome I'm trying to avoid. This farm hasn't even been online for 12 months. We had a consulting company come in, design the farm and install it in June 2010. There has been little in the way of customization done, and as far as I know, nothing beyond the OOB software has been installed. Also, as far as I know, none of the several other people who have privilege level to administer the farm and servers changed anything. Thanks for ANY, and I mean ANY assistance or suggestions. Even outrageous ones, 'cause I'm like, outta ideas, at least any that aren't alarmingly desperate. I'm not sure if this means anything. I got a list of the items in the config db's objects database(I'm assuming this roughly corresponds to the file system cache objects based on the contents of the cache's files content and the properties column in the rows in the config db.) 498 rows in config db, 510 objects in the filesystem config cache. C. Cunningham
April 8th, 2011 5:17pm

After more digging, I've checked the config db objects table against the config cache, and there are 13 objects that are in the config cache that don't exist in the db table. *CORRECTED BELOW- After inspecting the contents of one of the files, I discovered that one of the Sfld entries is missing it's terminating tag (at least, maybe it's data as well), which is the cause of the weird error, since the next field is a fld entry.* The object is an Infopath object, and I'm not sure where the cache is loading it from since it isn't in the objects table in the config database. I'm still not sure how to fix it, but at least there's some explanation for the error message. Hopefully I can find a solution tomorrow. I will also point out that during the initial attempt at cache reloading. I read an article about using the configuration wizard to repair an install, and when I tried that it failed as well. CORRECTION: When I originally wrote this, I had examined the object contents using IE to view the XML. As it turns out, IE just didn't display the rest of the field, when I examined the field in Notepad, it actually does appear to be terminated properly.
Free Windows Admin Tool Kit Click here and download it now
April 9th, 2011 11:32pm

After more digging, I've checked the config db objects table against the config cache, and there are 13 objects that are in the config cache that don't exist in the db table. After inspecting the contents of one of the files, I discovered that one of the Sfld entries is missing it's terminating tag (at least, maybe it's data as well), which is the cause of the weird error, since the next field is a fld entry. The object is an Infopath object, and I'm not sure where the cache is loading it from since it isn't in the objects table in the config database. I'm still not sure how to fix it, but at least there's some explanation for the error message. Hopefully I can find a solution tomorrow.
April 9th, 2011 11:33pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics