VM replica on Hyper-V cluster host fails with ID 53 not implemented

I have a three-node server 2012R2 Hyper-V failover cluster backing up to a Server 2012R2 running DPM 2012R2 RU5.  All three cluster nodes are connected to an iSCSI SAN using CSV volumes.  Windows backup is installed on the cluster nodes and the DPM server.  The Hyper-V role is also installed on the DPM server.  The VMs have the current integration services installed and enabled.

All of the Hyper-V hosts installed the DPM agent correctly and are backing up to the DPM server.  When I add one of the virtual machines to a protection group, the first attempted replica fails with this error:

"DPM failed to communicate with dpmserver.domain because of a communication error with the protection agent. (ID 53 Details: Not implemented (0x80000001))"

It fails after just under two minutes with 0 data transferred.  I noticed that this alert says it is failing to communicate with the DPM server and not the client agent.

From there the DPM server continues to attempt consistency checks that fail with this error:

"An unexpected error occurred while the job was running. (ID 104 Details: The RPC server is unavailable (0x800706BA))"

It also fails after just under two minutes with 0 data transferred.

I have disabled all firewalls and that hasn't helped.  If I watch the Hyper-V host, I can see a checkpoint get created for the VM and then the VM says "backing up".  After the ~2 minutes it merges and deletes the checkpoint.  I also watch process and TCP connections and don't see any failed attempts.

I'm adding the VM to the protection group on the DPM server by choosing to modify the protection group.  Then I expand the Hyper-V cluster, Hyper-V, and select the \online\VMName item.  It modifies the protection group and says it's creating the initial replica which fails with the errors above.

So, it looks like DPM is reaching in and checkpointing the VM, but it is not streaming it back to the DPM server.

Any insight into what I'm missing would be really appreciated.

February 17th, 2015 12:06am

Hi,

We recommend that you restart the protected computer after you apply the Update Rollup 5 Agent update. 
If protected computers are not restarted after you apply Update Rollup 5, the following things can occur: On Hyper-V/Windows Server 2012 R2 infrastructure, restart the computer (Protected_Computer) or other cluster nodes if VM backup jobs fail and generate either of the following error messages:


Protected_Computer needs to be restarted. This may be because the computer has not been restarted since the protection agent was installed. (ID 48 Details: The parameter is incorrect (0x80070057))


Data Protection Manager failed to communicate with Protected_Computer because of a communication error with the protection agent. (ID 53 Details: Not implemented (0x80000001)) 

Free Windows Admin Tool Kit Click here and download it now
February 17th, 2015 3:34am

Thanks for the suggestion.  I have read this elsewhere as a recommendation. 

All servers and clients have been restarted many times.  Also, this has been happening both before and after RU5.

February 19th, 2015 4:32am

Hi,

It still sounds like agent file binary mismatch between DPM server and the protected servers.  Suggest to un-install the agents and re-install them and make sure all agent versions match the DPM server version.

Free Windows Admin Tool Kit Click here and download it now
February 19th, 2015 6:44pm

All versions were the same but I uninstalled, rebooted, reinstalled, and rebooted.  That did not resolve the issue.

However, the reinstall did recreate firewall rules for

dpmra, any<>any

dpmra_dcom_135, any<>any

That lead me to investigate the communication of those two processes and ports.  Apparently, those two rules address the need for unrestricted DPMAgent communication between the cluster nodes for the CSV backups to work.  Previously, I had limited them to communication with the DPM server.  I opened them up to the other cluster nodes as well and the VM replicas completed.

The DPM error message was misleading since it was claiming a communication error with the DPM server and not another node in the cluster.

Solved.

  • Marked as answer by pdumigan Friday, February 20, 2015 7:05 PM
February 20th, 2015 10:04pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics