Azure Online Backup id 104 error after DPM UR5 update

This is the second weekend in a row with an error after initial Azure Online backup completed where this error "An unexpected error occurred while the job was running. (ID 104)" happens on a File Server that is 2008 R2.

DPM server is 2012 OS with DPM 2012 R2 UR5.

All other Azure online backups complete successfully. Mostly SQL database backups.

There is one VMM 2012 separate Azure online backup that is using a different vault and it is successful too.

When this error happens it seems to stall on the backup and then time-out, but its hard to figure out.

All I know is that it stops at the same time on Sunday at 2pm then restarts the job an hour later and never finishes.

Just started using Azure online backup after DPM UR5 with the File Server.  Its ~400GB in size with many files.

Can anyone point me to a way to troubleshoot this?  I did not want to configure other server backups until I can get this one resolved.

Thanks,

Denise

Type:    Online recovery point
Status:    Failed
Description:    An unexpected error occurred while the job was running. (ID 104)
    More information
End time:    3/15/2015 2:09:19 PM
Start time:    3/14/2015 6:30:10 PM
Time elapsed:    19:39:09
Data transferred:    7,061.84 MB
Cluster node    -
Source details:    S:\
Protection group members:    2
    Details
Protection group:    Filer

March 16th, 2015 1:09pm

Hi,

Most likely the online backup engine had a problem.  You can look in the C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog for problems.  Search on:  Backup Progress  - to see what phase of backup failed and look for errors for the failed job.

Free Windows Admin Tool Kit Click here and download it now
March 16th, 2015 3:55pm

You are right Mike.

I don't know how to interpret this, but at the end of the file I see the following where it became unresponsive. This was the only backup that ran on these two days so it was easy to find in the logs.

Now I need help figuring out how to make it work or if I should try an Online Recovery Point.

It will take several hours since its so big, but maybe its worth a shot to see if it every completes?

1640    0EAC    03/15    21:08:54.001    69    CBEngineWcfClientHelper.cs(320)        8D5DCCDB-EFF7-41C6-8927-16367C5B9212    NORMAL    Setting RequestId header for outgoing request | Params: {RequestId = db3a8a94-94ee-4ddf-a7d3-eaf32f08a49e}{WorkitemId = 80a1c67b-1720-42bf-86ff-ca4aeee8c20a}{TaskId = 8d5dccdb-eff7-41c6-8927-16367c5b9212}{AgentVersion = 2.0.8703.0}
1640    0EAC    03/15    21:08:54.273    79    WcfClient.cs(965)        8D5DCCDB-EFF7-41C6-8927-16367C5B9212    NORMAL    Finished web service call | Params: {ServiceInterface = Microsoft.Internal.CloudBackup.Common.FileCatalog.Interface.IFileCatalogExternalChannel}{Description = GetStatus}{TargetEndpoint = https://pod01-fc1.wus.backup.windowsazure.com/FileCatalogExternalService.svc}
1640    0EAC    03/15    21:08:54.273    79    WcfClient.cs(812)        8D5DCCDB-EFF7-41C6-8927-16367C5B9212    WARNING    FMBlock: Executing retry policy | Params: {CurrentAttempt = 2}{MaxAttempts = 18000}{BackOffIntervalInSec = 4.8}{OperationCode = WcfProxyGenericAsyncOperationPollingCall}{Retry policy id = 7}
1640    1788    03/15    21:08:54.701    71    dscontext.cpp(152)    [000000001C5E2130]    8D5DCCDB-EFF7-41C6-8927-16367C5B9212    WARNING    Last completed state for Ds Id (1126075830818088307) is 5
1640    1788    03/15    21:08:56.702    71    dscontext.cpp(152)    [000000001C5E2130]    8D5DCCDB-EFF7-41C6-8927-16367C5B9212    WARNING    Last completed state for Ds Id (1126075830818088307) is 5
1640    0AE8    03/15    21:08:58.523    22    genericthreadpool.cpp(823)    [000000001A5E3F20]        NORMAL    Hr: = [0x80070002] CGenericThreadPool::m_dwMaximumNumberOfThreads[256]
1640    0AE8    03/15    21:08:58.523    22    genericthreadpool.cpp(344)    [000000001A5E3F20]        WARNING    CGenericThreadPool: From: monitoring ==> Thread id: 0x1b1c is unresponsive for 432895796634 time(in 100 ns)(thread timeout: 432000000000), taskId: {8D5DCCDB-EFF7-41C6-8927-16367C5B9212}
1640    0AE8    03/15    21:08:58.523    22    genericthreadpool.cpp(355)    [000000001A5E3F20]        WARNING    Initiating resolution action for unresponsiveness. All running tasks will be cancelled.
1640    0AE8    03/15    21:08:58.523    70    onlineruntime.cpp(198)    [000000001A4FC4D0]        WARNING    Aborting online runtime now
1640    0AE8    03/15    21:08:58.523    70    onlineruntime.cpp(199)            FATAL    Process Abort: Exp: = "hrReasonThisTask" Hr: = [0x809909c2]
1640    0AE8    03/15    21:08:58.541    22    watsonintegration.cpp(74)            NORMAL    Inside Watson Handler
1640    0AE8    03/15    21:08:58.541    22    watsonintegration.cpp(117)            CRITICAL    Raising Watson for process

March 16th, 2015 5:02pm

Recovery point does not finish.

Also tried registry setting from another post with similar error. "disableThreadTimeout"

It didn't break the other online backups, but can't get the File Server backup to complete.

Any ideas?

Free Windows Admin Tool Kit Click here and download it now
March 18th, 2015 5:15pm

Hi,

yes, something is hung and timing out - probably need to have the dump read.

Are there crash dumps for cbengine under C:\ProgramData\Microsoft\Windows\WER\ReportQueue - if so please open a support case to have them analyzed.

March 18th, 2015 5:33pm

Yes. Just found them for both weekends.

On the 15th it only had a single .wer file but on the 8th it has the complete dump files when it crashed.

I'll get a support case opened.

Thanks,

Denise

Free Windows Admin Tool Kit Click here and download it now
March 19th, 2015 12:40pm

Opening a support case costs $$.

Here's the description from the event logs:

Description:
Faulting application name: cbengine.exe, version: 2.0.8703.0, time stamp: 0x54d4c328
Faulting module name: KERNELBASE.dll, version: 6.2.9200.16864, time stamp: 0x531d34d8
Exception code: 0x809909c2
Fault offset: 0x0000000000047b8c
Faulting process id: 0xbe0
Faulting application start time: 0x01d056d17028e0c5
Faulting application path: E:\Microsoft Azure Recovery Services Agent\Microsoft Azure Recovery Services Agent\bin\cbengine.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll
Report Id: be09aa32-c5d8-11e4-9442-001b78be3f44
Faulting package full name:
Faulting package-relative application ID:

Tried a new PG with just a 2008 R2 Server. It got 26GB out of 37GB total and just stopped.

All SQL DB's backup online with no problem.

Seems like multiple files are the problem.

March 19th, 2015 7:04pm

Same exact problem here. After the initial replica was created, online recovery points failed with same errors in DPM and Windows Application logs. Did you get any resolution from MS?

SCDPM 2012 R2 CU 5

Log Name:      Application
Source:        Windows Error Reporting
Date:          3/28/2015 2:06:12 PM
Event ID:      1001
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      DPM1.MDxHealth.local
Description:
Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: cbengine.exe
P2: 2.0.8703.0
P3: 54d4c328
P4: KERNELBASE.dll
P5: 6.3.9600.17278
P6: 53eebf2e
P7: 80070002
P8: 000000000000606c
P9: 
P10: 

Attached files:

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_cbengine.exe_6565982b90bb3c2eb2ea644e62a374552dd29321_7e6563ac_72194c0a

Analysis symbol: 
Rechecking for solution: 0
Report Id: 424991ea-d58e-11e4-80cc-00155d0f643d
Report Status: 0
Hashed bucket: 
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Windows Error Reporting" />
    <EventID Qualifiers="0">1001</EventID>
    <Level>4</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2015-03-28T21:06:12.000000000Z" />
    <EventRecordID>7604</EventRecordID>
    <Channel>Application</Channel>
    <Computer>DPM1.MDxHealth.local</Computer>
    <Security />
  </System>
  <EventData>
    <Data>
    </Data>
    <Data>0</Data>
    <Data>APPCRASH</Data>
    <Data>Not available</Data>
    <Data>0</Data>
    <Data>cbengine.exe</Data>
    <Data>2.0.8703.0</Data>
    <Data>54d4c328</Data>
    <Data>KERNELBASE.dll</Data>
    <Data>6.3.9600.17278</Data>
    <Data>53eebf2e</Data>
    <Data>80070002</Data>
    <Data>000000000000606c</Data>
    <Data>
    </Data>
    <Data>
    </Data>
    <Data>
    </Data>
    <Data>C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_cbengine.exe_6565982b90bb3c2eb2ea644e62a374552dd29321_7e6563ac_72194c0a</Data>
    <Data>
    </Data>
    <Data>0</Data>
    <Data>424991ea-d58e-11e4-80cc-00155d0f643d</Data>
    <Data>0</Data>
    <Data>
    </Data>
  </EventData>
</Event>

0E7C 0E30 03/28 19:52:56.293 69 CBEngineWcfClientHelper.cs(320) 7AAF019E-D8F8-4AE6-A87C-9C554423711E NORMAL Setting RequestId header for outgoing request | Params: {RequestId = dd346919-a9fb-48d0-bb5f-13e50c13914f}{WorkitemId = c1babe3e-6672-4a78-8a3b-830589b07438}{TaskId = 7aaf019e-d8f8-4ae6-a87c-9c554423711e}{AgentVersion = 2.0.8703.0}
0E7C 1DB4 03/28 19:52:56.528 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:52:58.543 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:00.544 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:02.560 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:04.566 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:06.567 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:08.582 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:10.589 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:12.590 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:14.601 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:16.603 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:18.605 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:20.621 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:22.627 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1DB4 03/28 19:53:24.629 71 dscontext.cpp(152) [00000000220A4830] 7AAF019E-D8F8-4AE6-A87C-9C554423711E WARNING Last completed state for Ds Id (1126111013915291845) is 5
0E7C 1500 03/28 19:53:25.459 22 genericthreadpool.cpp(823) [000000001ADE55F0] NORMAL Hr: = [0x80070002] CGenericThreadPool::m_dwMaximumNumberOfThreads[256]
0E7C 1500 03/28 19:53:25.459 22 genericthreadpool.cpp(344) [000000001ADE55F0] WARNING CGenericThreadPool: From: monitoring ==> Thread id: 0x1c70 is unresponsive for 439791360416 time(in 100 ns)(thread timeout: 432000000000), taskId: {7AAF019E-D8F8-4AE6-A87C-9C554423711E}
0E7C 1500 03/28 19:53:25.459 22 genericthreadpool.cpp(355) [000000001ADE55F0] WARNING Initiating resolution action for unresponsiveness. All running tasks will be cancelled.
0E7C 1500 03/28 19:53:25.459 70 onlineruntime.cpp(198) [000000001ACAA270] WARNING Aborting online runtime now
0E7C 1500 03/28 19:53:25.459 70 onlineruntime.cpp(199) FATAL Process Abort: Exp: = "hrReasonThisTask" Hr: = [0x809909c2]
0E7C 1500 03/28 19:53:25.506 22 watsonintegration.cpp(74) NORMAL Inside Watson Handler
0E7C 1500 03/28 19:53:25.506 22 watsonintegration.cpp(117) CRITICAL Raising Watson for process

Free Windows Admin Tool Kit Click here and download it now
April 3rd, 2015 12:52pm

No I have not heard back from Microsoft.

I also have the problem with the Azure Scratch disk on the local drive using 110GB of space after the failed backups that I did try.

Was trying to figure out if I can just delete since I turned off Online backup to Azure on the servers that failed in DPM. I identified the GUIID for each that failed.

Right now I am not doing any Server backups to Azure because of this error. Just SQL DB backups because they work reliably.

April 3rd, 2015 1:58pm

Hi,

Please update the Azure backup agent to the latest available on the portal (2.0.8704.0) and if you continue to see crashes please open a support case to have the problem investigated.

Free Windows Admin Tool Kit Click here and download it now
April 6th, 2015 1:26pm

Hi Mike,

So far that update to 2.0.8704.0 works. Can create a recovery point on a volume drive and it completed a scheduled update last night.

We were on 2.0.8703.0 when it crashed.

Next test will be to confirm our Filer volumes work with the online agent.

Thanks for your help.

Denise

April 8th, 2015 10:50am

This is successful on Filer after updating Azure Online agent to 2.0.8704.0
Free Windows Admin Tool Kit Click here and download it now
April 12th, 2015 8:38am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics