Backups from this weekend (2014-02-16):
Disk backups:
First successful: 01:35:01
Last successful: 23:56:56
First failed: 19:10:53
Last failed: 23:56:09
Tape Backups - disk backups are only colocated every Saturday (15-2-14). Looking at the ones from this weekend:
First failed: 12:00:14
Last failed: 00:01:57 (ran over into 16-2-14)
As you can see the first verification failure occurred after the last backup failure.
Every single one failed. A grand total of 0MB was transferred to tape that day.
From DPM Alerts event log for 15-2-14: There are 144 events in that log from that day, they are all either warnings or errors. Please see attached for an example of what they looked like at each stage outlined below.
Heres a summary of what happened when it tried to write backups to tape this weekend:
- First it couldnt reserve the drive-either it couldnt find it (likely given that it had again lost the drive when I checked the management library in the DPM Administrator Console this morning) or it needed cleaning.
- It couldnt find valid recovery points for two of our production servers even though the Protection screen shows them all as having 18 recovery points.
- It couldnt create a recovery point for a specific VM (not on one of the two above servers) because the data source was not available.
- It encountered a retryable VSS error on one of the other Hyper-V host servers. Doing a vssadmin list writers on that server today the Microsoft Hyper-V VSS Writer has a state of Failed and a Last error of retryable error.
- The drive tape library wasnt ready.
- It couldnt reserve the drive-either it couldnt find it or it needed cleaning.
- Part way through the last set of errors it found a job that required a tape that wasnt in the library.
- Then a lot of warnings that it couldnt verify tapes for this reason. Dont know why it was trying to do a verify on tapes it hadnt backed up anything to.
- Backup of Non VSS Datasource Writer on TEST VM server2 FQDN cannot be completed. DPM could not find a valid recovery point on disk. (ID: 30126) for a couple of servers. Again all of the VMs on both of these servers had numerous recovery points.
- Got a datasource not available error when it tried to backup one specific VM.
- Complained again about a retryable VSS error on the same server it happened on above.
- The remaining 5 that went over to the next day all either say that the job requires tape that is not available in the library or the tape free tape is not available in the tape library.
So there seem to be a few main issues here:
- It keeps losing connection with the drive in the tape library despite having a direct connection to it.
- It couldnt find valid recovery points VM listed as having numerous recovery points.
- Keep getting retryable VSS error on Hyper-V host servers for the Microsoft Hyper-V VSS Writer. These keep cropping up for no apparent reason regardless of what backup software we use and cause disruption every time they do because of failed backups and
the fact that we have to reboot the host server (meaning downtime for all the VMs hosted on there) as thats the only way we know of clearing the error.
- It still keeps trying to use tapes that arent in the library.
Edit: hmm, annoyingly there doesn't seem to be a way to attach anything here and it's a bit lengthy for a post
- Edited by
SysAdminITL
Monday, February 17, 2014 12:23 PM