So today I noticed that the drive containing the inboxes on my primary site was filled. Upon further investigation it appears that the BGB.box folder was filled with .__M files and the inbox\bgb.box\bad folder was filled with .BOS files (BGB online status files). The bgbmgr.log file has several entries indicating the following:
Inbox changes detected. Sleep for 1 second and process inbox files SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:55 PM 3240 (0x0CA8)
Begin to get file list to process SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
Begin to handle file 1dyj1xcg.BOS SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
Begin to process file D:\Program Files\Microsoft Configuration Manager\inboxes\bgb.box\1dyj1xcg.BOS SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
ERROR: Failed to parse online status file SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
ERROR: Failed to execute task class OnlineStatusParser SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
ERROR: Failed to parse file SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
ERROR: Failed to execute task class OnlineStatusProcessTask SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
WARNING: Failed to process file 1dyj1xcg.BOS, move it to bad inbox SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
Begin to move file from D:\Program Files\Microsoft Configuration Manager\inboxes\bgb.box\1dyj1xcg.BOS to D:\Program Files\Microsoft Configuration Manager\inboxes\bgb.box\bad\1dyj1xcg.BOS SMS_NOTIFICATION_MANAGER 3/4/2014
2:30:56 PM 3240 (0x0CA8)
BgbManager is waiting for file and registry change notification or timeout after 60 seconds SMS_NOTIFICATION_MANAGER 3/4/2014 2:30:56 PM 3240 (0x0CA8)
SQL>>>set quoted_identifier on;set ansi_warnings on;set ansi_padding on;set ansi_nulls on;set concat_null_yields_null on;set arithabort on;set numeric_roundabort off;set DATEFORMAT mdy; SMS_NOTIFICATION_MANAGER 3/4/2014
2:31:44 PM 3248 (0x0CB0)
SQL>>>>> Done. SMS_NOTIFICATION_MANAGER 3/4/2014 2:31:44 PM 3248 (0x0CB0)
I cleared out (moved the BOS files) the outbox on the MP server and inbox on the PS which slowed the assault but as soon as a new BOS file is created, the errors appear again. Looking into the BOS files, it looks like the entries are consistent so I am not sure how the parser is failing.
Just to cover my bases on supplying info I ran the following query against the primary site server DB:
select B.ServerName, A.OnlineClients as 'Online Clients' from Bgb_Server B left
join dbo.v_BgbServerCurrent A on A.ServerID=B.ServerID order by 2
The response was the FQDN of the MP server and the online clients count was 0.
It seems this all started when I had Premier Support track down a corrupted Status message SQL file that was preventing status messages from processing on the affected primary site. The other primaries are as happy as can be.