Hi Mandy, tanks for replying! The space is working fine again after the BSOD and following reboot of my two cluster nodes. I have supplied a chronological excerpt of the logs from one node.
The first two logs are controller and disk error that repeat all the way up to the reboot. After the reboot the disk was missing and no LSI and disk logs. The disk has been replaced and my pool/space is healty again.. Sorry about the data overload. Please
let me know if there is a way to improve formatting :-)
Log Name: System
Source: LSI_SAS2
Date: 24.03.2015 13:19:59
Event ID: 11
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: NODE1.domain.local
Description:
The driver detected a controller error on \Device\RaidPort1.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="LSI_SAS2" />
<EventID Qualifiers="49156">11</EventID>
<Level>2</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:19:59.662050100Z" />
<EventRecordID>94345</EventRecordID>
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security />
</System>
<EventData>
<Data>\Device\RaidPort1</Data>
<Binary>0F00180001000000000000000B0004C01A01123100000000000000000000000000000000000000000000000000000000000000000B0004C00000000000000000</Binary>
</EventData>
</Event>
Log Name: System
Source: disk
Date: 24.03.2015 13:22:02
Event ID: 153
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: NODE1.domain.local
Description:
The IO operation at logical block address 0x1d7690 for Disk 62 (PDO name: \Device\00000096) was retried.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="disk" />
<EventID Qualifiers="32772">153</EventID>
<Level>3</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:22:02.057930400Z" />
<EventRecordID>94363</EventRecordID>
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security />
</System>
<EventData>
<Data>\Device\Harddisk62\DR62</Data>
<Data>0x1d7690</Data>
<Data>62</Data>
<Data>\Device\00000096</Data>
<Binary>0F01040004002C0000000000990004800000000000000000000000000000000000000000000000000000122A</Binary>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 24.03.2015 13:27:13
Event ID: 1230
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: NODE1.domain.local
Description:
A component on the server did not respond in a timely fashion. This caused the cluster resource 'Cluster Pool 1' (resource type 'Storage Pool', DLL 'clusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be
taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated
with the resource are functioning correctly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1230</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:27:13.582340500Z" />
<EventRecordID>94367</EventRecordID>
<Correlation />
<Execution ProcessID="1384" ThreadID="10660" />
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">Cluster Pool 1</Data>
<Data Name="ResourceType">Storage Pool</Data>
<Data Name="ResTypeDll">clusres.dll</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 24.03.2015 13:28:05
Event ID: 1230
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: NODE1.domain.local
Description:
A component on the server did not respond in a timely fashion. This caused the cluster resource 'HyperQuorum01' (resource type 'Physical Disk', DLL 'clusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be
taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated
with the resource are functioning correctly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1230</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:28:05.019505000Z" />
<EventRecordID>94368</EventRecordID>
<Correlation />
<Execution ProcessID="1384" ThreadID="10660" />
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">HyperQuorum01</Data>
<Data Name="ResourceType">Physical Disk</Data>
<Data Name="ResTypeDll">clusres.dll</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 24.03.2015 13:31:13
Event ID: 1146
Task Category: Resource Control Manager
Level: Critical
Keywords:
User: SYSTEM
Computer: NODE1.domain.local
Description:
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing
the issue.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1146</EventID>
<Version>0</Version>
<Level>1</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:31:13.010887000Z" />
<EventRecordID>94370</EventRecordID>
<Correlation />
<Execution ProcessID="1384" ThreadID="3564" />
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="NodeName">INFRA-STOR01</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 24.03.2015 13:34:13
Event ID: 1069
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: NODE1.domain.local
Description:
Cluster resource 'StorageQuorum' of type 'Physical Disk' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1069</EventID>
<Version>1</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:34:13.084369900Z" />
<EventRecordID>94373</EventRecordID>
<Correlation />
<Execution ProcessID="1384" ThreadID="3564" />
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">StorageQuorum</Data>
<Data Name="ResourceGroup">Cluster Group</Data>
<Data Name="ResTypeDll">Physical Disk</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 24.03.2015 13:36:13
Event ID: 1230
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: NODE1.domain.local
Description:
A component on the server did not respond in a timely fashion. This caused the cluster resource 'ClusterDisk01' (resource type 'Physical Disk', DLL 'clusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be
taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated
with the resource are functioning correctly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1230</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:36:13.242993800Z" />
<EventRecordID>94380</EventRecordID>
<Correlation />
<Execution ProcessID="1384" ThreadID="7264" />
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">ClusterDisk01</Data>
<Data Name="ResourceType">Physical Disk</Data>
<Data Name="ResTypeDll">clusres.dll</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 24.03.2015 13:37:13
Event ID: 1069
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: NODE1.domain.local
Description:
Cluster resource 'HyperQuorum01' of type 'Physical Disk' in clustered role '3a6281f0-1b41-4176-8071-0e2aa35d9ee9' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1069</EventID>
<Version>1</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:37:13.109674700Z" />
<EventRecordID>94381</EventRecordID>
<Correlation />
<Execution ProcessID="1384" ThreadID="3564" />
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">HyperQuorum01</Data>
<Data Name="ResourceGroup">3a6281f0-1b41-4176-8071-0e2aa35d9ee9</Data>
<Data Name="ResTypeDll">Physical Disk</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 24.03.2015 13:39:13
Event ID: 1230
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: NODE1.domain.local
Description:
A component on the server did not respond in a timely fashion. This caused the cluster resource 'HyperQuorum01' (resource type 'Physical Disk', DLL 'clusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be
taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated
with the resource are functioning correctly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1230</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:39:13.127408000Z" />
<EventRecordID>94383</EventRecordID>
<Correlation />
<Execution ProcessID="1384" ThreadID="4740" />
<Channel>System</Channel>
<Computer>NODE1.domain.local</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">HyperQuorum01</Data>
<Data Name="ResourceType">Physical Disk</Data>
<Data Name="ResTypeDll">clusres.dll</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-WER-SystemErrorReporting
Date: 24.03.2015 13:52:49
Event ID: 1001
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: INFRA-STOR01
Description:
The computer has rebooted from a bugcheck. The bugcheck was: 0x0000009e (0xffffe0011d522080, 0x00000000000004b0, 0x0000000000000005, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 032415-233875-01.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-WER-SystemErrorReporting" Guid="{ABCE23E7-DE45-4366-8631-84FA6C525952}" EventSourceName="BugCheck" />
<EventID Qualifiers="16384">1001</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2015-03-24T12:52:49.000000000Z" />
<EventRecordID>94394</EventRecordID>
<Correlation />
<Execution ProcessID="0" ThreadID="0" />
<Channel>System</Channel>
<Computer>INFRA-STOR01</Computer>
<Security />
</System>
<EventData>
<Data Name="param1">0x0000009e (0xffffe0011d522080, 0x00000000000004b0, 0x0000000000000005, 0x0000000000000000)</Data>
<Data Name="param2">C:\Windows\MEMORY.DMP</Data>
<Data Name="param3">032415-233875-01</Data>
</EventData>
</Event>
-
Edited by
Jorgen Fundingsrud
Thursday, March 26, 2015 11:04 AM
Clarification after re reading