DPM trying to verify data on tape marked as offsite ready

Hi,

Tape verify jobs should be scheduled to start after all the tape backups jobs finish.

When were the tape verify jobs that failed originally scheduled to run.  IE: What was the start date-time.

Did they fail for the same tape label that were used for the tape backups that finished ?

February 12th, 2014 12:33pm

I'm using the latest fully updated DPM 2012 R2 on Server 2012 R2 also fully updated. Yesterday it marked a few tapes as being offsite ready so I removed them from the tape library, did an inventory to ensure they weren't still listed as being there when they weren't and ensured that it had a tape to co-locate the backups to that night. This morning I have a lot of 'Cannot verify tape data' warnings with the event logs indicating that it was trying to use some of the tapes that I removed yesterday for verification.

Why is it trying to use tapes that aren't there? What can I do to prevent it doing this?

Free Windows Admin Tool Kit Click here and download it now
February 12th, 2014 2:30pm

>Tape verify jobs should be scheduled to start after all the tape backups jobs finish.

I don't understand what you mean, there are no jobs in DPM.

There is no way to configure the time the verification runs in the Protection Group settings, there is merely a tick box labelled 'Check backup for data integrity (time consuming operation). I don't know how it could run before the backup is finished if it's also set to colocate to tape?

February 13th, 2014 5:54am

Hi,

You will need to make a custom job filter and only select tape backup and tape verify job types. This will give you the order and times that those jobs started.

Compare those times with when tapes were pulled, I suspect verifications were still ongoing.

Free Windows Admin Tool Kit Click here and download it now
February 13th, 2014 8:50am

>Compare those times with when tapes were pulled, I suspect verifications were still ongoing.

If that were the case I'd expect the errors to be consistently referring to the same tape that was in the drive at the time as only one can be in use at any one time in our tape library. However, I ejected 3 tapes that were marked as being ready for offsite storage and then did a fast inventory in DPM and confirmed that their respective slots were marked as empty as expected. Sure enough the events from this time make reference to multiple tapes, not just one.

I'm still getting 'Cannot verify tape data' errors in the GUI although I've not seen any events with event ID 3309 for about 3 days now.

February 14th, 2014 7:15am

Did you make a custom job filter to look at the chronological order of the tape backup and verify jobs leading up to the "Cannot verify tape data" errors ?
Free Windows Admin Tool Kit Click here and download it now
February 14th, 2014 10:43am

Backups from this weekend (2014-02-16):

Disk backups:

First successful: 01:35:01

Last  successful: 23:56:56

 

First failed: 19:10:53

Last failed:  23:56:09

 

Tape Backups - disk backups are only colocated every Saturday (15-2-14). Looking at the ones from this weekend:

First failed: 12:00:14

Last failed:  00:01:57 (ran over into 16-2-14)

As you can see the first verification failure occurred after the last backup failure.

Every single one failed. A grand total of 0MB was transferred to tape that day.

From DPM Alerts event log for 15-2-14: There are 144 events in that log from that day, they are all either warnings or errors. Please see attached for an example of what they looked like at each stage outlined below.


Heres a summary of what happened when it tried to write backups to tape this weekend:

  • First it couldnt reserve the drive-either it couldnt find it (likely given that it had again lost the drive when I checked the management library in the DPM Administrator Console this morning) or it needed cleaning.
  • It couldnt find valid recovery points for two of our production servers even though the Protection screen shows them all as having 18 recovery points.
  • It couldnt create a recovery point for a specific VM (not on one of the two above servers) because the data source was not available.
  • It encountered a retryable VSS error on one of the other Hyper-V host servers. Doing a vssadmin list writers on that server today the Microsoft Hyper-V VSS Writer has a state of Failed and a Last error of retryable error.
  • The drive tape library wasnt ready.
  • It couldnt reserve the drive-either it couldnt find it or it needed cleaning.
  • Part way through the last set of errors it found a job that required a tape that wasnt in the library.
  • Then a lot of warnings that it couldnt verify tapes for this reason. Dont know why it was trying to do a verify on tapes it hadnt backed up anything to.
  • Backup of Non VSS Datasource Writer on TEST VM server2 FQDN cannot be completed. DPM could not find a valid recovery point on disk. (ID: 30126) for a couple of servers. Again all of the VMs on both of these servers had numerous recovery points.
  • Got a datasource not available error when it tried to backup one specific VM.
  • Complained again about a retryable VSS error on the same server it happened on above.
  • The remaining 5 that went over to the next day all either say that the job requires tape that is not available in the library or the tape free tape is not available in the tape library.

So there seem to be a few main issues here:

  1. It keeps losing connection with the drive in the tape library despite having a direct connection to it.
  2. It couldnt find valid recovery points VM listed as having numerous recovery points.
  3. Keep getting retryable VSS error on Hyper-V host servers for the Microsoft Hyper-V VSS Writer. These keep cropping up for no apparent reason regardless of what backup software we use and cause disruption every time they do because of failed backups and the fact that we have to reboot the host server (meaning downtime for all the VMs hosted on there) as thats the only way we know of clearing the error.
  4. It still keeps trying to use tapes that arent in the library.
Edit: hmm, annoyingly there doesn't seem to be a way to attach anything here and it's a bit lengthy for a post
February 17th, 2014 7:23am

Backups from this weekend (2014-02-16):

Disk backups:

First successful: 01:35:01

Last  successful: 23:56:56

 

First failed: 19:10:53

Last failed:  23:56:09

 

Tape Backups - disk backups are only colocated every Saturday (15-2-14). Looking at the ones from this weekend:

First failed: 12:00:14

Last failed:  00:01:57 (ran over into 16-2-14)

As you can see the first verification failure occurred after the last backup failure.

Every single one failed. A grand total of 0MB was transferred to tape that day.

From DPM Alerts event log for 15-2-14: There are 144 events in that log from that day, they are all either warnings or errors. Please see attached for an example of what they looked like at each stage outlined below.


Heres a summary of what happened when it tried to write backups to tape this weekend:

  • First it couldnt reserve the drive-either it couldnt find it (likely given that it had again lost the drive when I checked the management library in the DPM Administrator Console this morning) or it needed cleaning.
  • It couldnt find valid recovery points for two of our production servers even though the Protection screen shows them all as having 18 recovery points.
  • It couldnt create a recovery point for a specific VM (not on one of the two above servers) because the data source was not available.
  • It encountered a retryable VSS error on one of the other Hyper-V host servers. Doing a vssadmin list writers on that server today the Microsoft Hyper-V VSS Writer has a state of Failed and a Last error of retryable error.
  • The drive tape library wasnt ready.
  • It couldnt reserve the drive-either it couldnt find it or it needed cleaning.
  • Part way through the last set of errors it found a job that required a tape that wasnt in the library.
  • Then a lot of warnings that it couldnt verify tapes for this reason. Dont know why it was trying to do a verify on tapes it hadnt backed up anything to.
  • Backup of Non VSS Datasource Writer on TEST VM server2 FQDN cannot be completed. DPM could not find a valid recovery point on disk. (ID: 30126) for a couple of servers. Again all of the VMs on both of these servers had numerous recovery points.
  • Got a datasource not available error when it tried to backup one specific VM.
  • Complained again about a retryable VSS error on the same server it happened on above.
  • The remaining 5 that went over to the next day all either say that the job requires tape that is not available in the library or the tape free tape is not available in the tape library.

So there seem to be a few main issues here:

  1. It keeps losing connection with the drive in the tape library despite having a direct connection to it.
  2. It couldnt find valid recovery points VM listed as having numerous recovery points.
  3. Keep getting retryable VSS error on Hyper-V host servers for the Microsoft Hyper-V VSS Writer. These keep cropping up for no apparent reason regardless of what backup software we use and cause disruption every time they do because of failed backups and the fact that we have to reboot the host server (meaning downtime for all the VMs hosted on there) as thats the only way we know of clearing the error.
  4. It still keeps trying to use tapes that arent in the library.
Edit: hmm, annoyingly there doesn't seem to be a way to attach anything here and it's a bit lengthy for a post
  • Edited by SysAdminITL Monday, February 17, 2014 12:23 PM
Free Windows Admin Tool Kit Click here and download it now
February 17th, 2014 3:22pm

Backups from this weekend (2014-02-16):

Disk backups:

First successful: 01:35:01

Last  successful: 23:56:56

 

First failed: 19:10:53

Last failed:  23:56:09

 

Tape Backups - disk backups are only colocated every Saturday (15-2-14). Looking at the ones from this weekend:

First failed: 12:00:14

Last failed:  00:01:57 (ran over into 16-2-14)

As you can see the first verification failure occurred after the last backup failure.

Every single one failed. A grand total of 0MB was transferred to tape that day.

From DPM Alerts event log for 15-2-14: There are 144 events in that log from that day, they are all either warnings or errors. Please see attached for an example of what they looked like at each stage outlined below.


Heres a summary of what happened when it tried to write backups to tape this weekend:

  • First it couldnt reserve the drive-either it couldnt find it (likely given that it had again lost the drive when I checked the management library in the DPM Administrator Console this morning) or it needed cleaning.
  • It couldnt find valid recovery points for two of our production servers even though the Protection screen shows them all as having 18 recovery points.
  • It couldnt create a recovery point for a specific VM (not on one of the two above servers) because the data source was not available.
  • It encountered a retryable VSS error on one of the other Hyper-V host servers. Doing a vssadmin list writers on that server today the Microsoft Hyper-V VSS Writer has a state of Failed and a Last error of retryable error.
  • The drive tape library wasnt ready.
  • It couldnt reserve the drive-either it couldnt find it or it needed cleaning.
  • Part way through the last set of errors it found a job that required a tape that wasnt in the library.
  • Then a lot of warnings that it couldnt verify tapes for this reason. Dont know why it was trying to do a verify on tapes it hadnt backed up anything to.
  • Backup of Non VSS Datasource Writer on TEST VM server2 FQDN cannot be completed. DPM could not find a valid recovery point on disk. (ID: 30126) for a couple of servers. Again all of the VMs on both of these servers had numerous recovery points.
  • Got a datasource not available error when it tried to backup one specific VM.
  • Complained again about a retryable VSS error on the same server it happened on above.
  • The remaining 5 that went over to the next day all either say that the job requires tape that is not available in the library or the tape free tape is not available in the tape library.

So there seem to be a few main issues here:

  1. It keeps losing connection with the drive in the tape library despite having a direct connection to it.
  2. It couldnt find valid recovery points VM listed as having numerous recovery points.
  3. Keep getting retryable VSS error on Hyper-V host servers for the Microsoft Hyper-V VSS Writer. These keep cropping up for no apparent reason regardless of what backup software we use and cause disruption every time they do because of failed backups and the fact that we have to reboot the host server (meaning downtime for all the VMs hosted on there) as thats the only way we know of clearing the error.
  4. It still keeps trying to use tapes that arent in the library.
Edit: hmm, annoyingly there doesn't seem to be a way to attach anything here and it's a bit lengthy for a post
  • Edited by SysAdminITL Monday, February 17, 2014 12:23 PM
February 17th, 2014 3:22pm

Hi,

Let me address these the best I can in a forum setting.

P=Problem  R=Reply

P1) It keeps losing connection with the drive in the tape library despite having a direct connection to it.
R1) This connectivity issue will need to be troubleshot with the vendor, however, triple check library/ drive and controller firmware and drivers. Check cabling, termination, Power, airflow. If you have multiple drives - try disabling drive(s) to reduce IO over the bus and see if it stablizes.

  • P2) It couldnt find valid recovery points VM listed as having numerous recovery points.
    R2) DPM will never copy the same disk based recovery point to tape a second time, so this means we didn't find a new disk based recovery point since the last successful tape based recovery point.  Double check last successful disk based recovery point time.
  • P3) Keep getting retryable VSS error on Hyper-V host servers for the Microsoft Hyper-V VSS Writer. These keep cropping up for no apparent reason regardless of what backup software we use and cause disruption every time they do because of failed backups and the fact that we have to reboot the host server (meaning downtime for all the VMs hosted on there) as thats the only way we know of clearing the error.
    R3)  Make sure you have the following Windows 2012 R2 rollup installed that includes some VSS related fixes.

    2887595 Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 update rollup: November 2013
    http://support.microsoft.com/kb/2887595/EN-US 
  • P4) It still keeps trying to use tapes that arent in the library.
    R4) Would need to troubleshoot this by looking at logs and job history - open a support case for root cause analysis. 

Free Windows Admin Tool Kit Click here and download it now
February 17th, 2014 5:21pm

> R1) This connectivity issue will need to be troubleshot with the vendor, however, triple check library/ drive and controller firmware and drivers.
I'm using a Quantum Superloader 3 and we don't get support for it. I have updated the driver and enabled Microsoft Backup support in the device properties so I should now at least be able to eject or load cartidges without having to do a rescan and refresh every time (why can't DPM eject tapes?).

Hopefully this'll help make it more reliable.

>R3)  Make sure you have the following Windows 2012 R2 rollup installed that includes some VSS related fixes.
I'm not sure if you're saying this should be on the backup server, the hosts being backed up or both but even if some hosts aren't completely up to date they aren't nearly that far out and as I mentioned the backup server is fully updated.

>R4) Would need to troubleshoot this by looking at logs and job history - open a support case for root cause analysis.
Please realise that even though we're a Microsoft Gold Partner we only get to ask 5 questions a year for all products and environments. We therefore need to pick our battles wisely. This is about the only place I've found where you stand a chance of getting answers to DPM questions. In Experts-Exchange you'll be lucky to get a tumble weed blowing through.

I'm aware there are a number of issues here, I've created a separate thread to look at P2 in more detail, I may have to open a call with Microsoft support for P4 but I'll see what further investigation I can do myself first. Thanks for all your help.
February 19th, 2014 12:59pm

Hi,

For R3 - the VSS errors are occurring on the Hyper-V host when DPM tries to create the shadow copy to get a consistent backup - so the goal is to eliminate the VSS errors and apply updates to all the Hyper-V host(s) experiencing the errors.

Free Windows Admin Tool Kit Click here and download it now
February 19th, 2014 1:41pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics