Good afternoon,
I've bumped into this issue, and have yet to find a good solution.
The environment
Catalyst 3560G running 15.0(1)SE with following port config
switchport access vlan <some number> switchport mode access spanning-tree portfast spanning-tree bpduguard enable
Catalyst 6509 w/ VS-SUP2T-10G running 15.1(2)SY with WS-X6148E-GE-45AT blades and the following port config
switchport switchport access vlan <some number> switchport mode access logging event link-status spanning-tree portfast edge spanning-tree bpduguard enable
The vlan interface on the 6509 is configured as...
ip address 10.xx.xx.252 255.255.255.0 ip broadcast-address 10.xx.xx.255 ip helper-address <ip of DHCP server> ip helper-address <ip of SCCM server> no ip redirects ip directed-broadcast ip pim sparse-dense-mode
We are using ip helper on the switches. There is no 802.1x configuration that might be fiddling with port settings.
I've been testing with this problem against both switches to rule out network differences.
Monitor ports have been configured on the switches so we can watch the traffic to/from the workstations that are experiencing the DHCP timeouts.
The endpoints are workstations running Windows 7 with SP1.
The problem
We're seeing lots of NETLOGON 5719 errors on boot up. This is breaking group policy processing and a few other boot time processes. The root cause appears to be DHCP requests timing out, which are visible in the DHCP Operational Log as EvendID 50024. So my problem is that DHCP requests are timing out. I need to find out why and get it working so our endpoints start working as expected.
The tests performed
I've taken a sample of machines that consistently exhibit problems. Some have the Gigabyte GA-890GPA-UD3H and others a Gigabyte F2A88XM-D3H. Both systems use the onboard Realtek NIC. From the PCI IDs, they use the exact same NIC.
F2A88XM-D3H - PCI\VEN_10EC&DEV_8168&SUBSYS_E0001458 GA-890GPA-UD3H - PCI\VEN_10EC&DEV_8168&SUBSYS_E0001458
I've tested with this with drivers from Realtek. Both versions 7.73.618.2013 and 7.92.115.2015 (current as of 2015-05-22).
I've already read up on and deployed the hotfix from KB2459530. Checking the file versions on stuff like dhcpcore.dll and friends confirms the hotfix is installed. KB2459530 also talks about manually tweaking the DhcpGlobalForceBroadcastFlag and DhcpConnForceBroadcastFlag values. I've made the required changes and confirmed via the monitor port that requests are leaving the workstation with the Broadcast flag set to 1, instead of Unicast (0).
All of that said, I am still seeing inconsistencies between the workstation DHCP operational event log and the captured traffic on the monitor port.
Here is an example that is consistent across all the test machines...
5/22/2015 1:00:26 PM 50044 Information Inform ack is received in the adapter 11. 5/22/2015 1:00:26 PM 50018 Information Inform is sent in the adapter 11. Status code is 0x0 5/22/2015 1:00:26 PM 50058 Information Your computer was successfully assigned an address from the network, and it can now connect to other computers. 5/22/2015 1:00:26 PM 50042 Information Dns registration has happened for the adapter 11. Status Code is 0x0. DNS Flag settings is 64. 5/22/2015 1:00:26 PM 50028 Information Address 10.40.250.2 is plumbed to the adapter 11. Status code is 0x0 5/22/2015 1:00:23 PM 50063 Information Dhcp has notified NLA for the configuration changes for the interface 11 5/22/2015 1:00:23 PM 50035 Information Routes are updated in the adapter 11. Status Code is 0x0 5/22/2015 1:00:23 PM 50059 Information Route is added with the values Dest = 0.0.0.0, DestMask = 0.0.0.0, NextHop = 10.40.250.254, Address = 10.40.250.2 5/22/2015 1:00:23 PM 60000 Information PERFTRACK (Request-Ack): Address confirmed for the adapter 11.Confirmed Address is 10.40.250.2.Server address is 10.0.10.21 5/22/2015 1:00:23 PM 60010 Information PERFTRACK (Request-Ack): Address confirmed for the adapter 11.Confirmed Address is 10.40.250.2.Server address is 10.0.10.21 5/22/2015 1:00:23 PM 50013 Information Ack is accepted in the adapter 11. Received Address is 10.40.250.2.Server address is 10.0.10.21 5/22/2015 1:00:23 PM 50012 Information Request is sent from the adapter 11. Status code is 0x0 5/22/2015 1:00:23 PM 50024 Warning Ack Receive Timeout has happened in the Interface Id 11 5/22/2015 1:00:20 PM 50012 Information Request is sent from the adapter 11. Status code is 0x0 5/22/2015 1:00:20 PM 50006 Information Request-Ack is initiated on the adapter with Interface Id 11 5/22/2015 1:00:20 PM 60018 Information PERFTRACK (DHCPv4): Media Connect on adapter 11 5/22/2015 1:00:20 PM 60019 Information PERFTRACK (DHCPv4): End of Media Connect on adapter 11 5/22/2015 1:00:20 PM 50025 Information Cancelling pending renewals on the adapter in the Interface Id 11 5/22/2015 1:00:20 PM 50033 Information An interface is added whose interface index is 11 and Status Code is 0x0. 5/22/2015 1:00:20 PM 50004 Information Dhcp is enabled on the adapter with Interface Id 11 5/22/2015 1:00:20 PM 50001 Information Media Connect notification received with Interface Id 11 5/22/2015 1:00:20 PM 50002 Information Media Disconnect notification received with Interface Id 11 5/22/2015 1:00:20 PM 50001 Information Media Connect notification received with Interface Id 1
The initial request (Event 50012) was sent at 1:00:20. The timeout is reached at 1:00:23 (Event 50024) and the request is subsequently resent (50012). The second request gets a response and the DHCP service binds the provided IP to the interface.
However, on the monitored port, Wireshark doesn't see ANY of the traffic from 1:00:20. The first DHCP Request we see on the wire is at 1:00:23. The rest of the conversation in Wireshark matches what is listed in the Event log.
I have confirmed that the switches and workstations are pulling NTP from the same source, so the timestamps in wireshark are accurate when comparing to event log entries.
With the Realtek drivers, I have experimented with Energy Efficient Ethernet (EEE, 802.3az) and
Green Ethernet with no change in results. They remain disabled while we continue testing.
So, although this matches the problems seen in 2459530, it addresses a problem where the DHCP request was being sent with the Broadcast flag set to 0 and the windows firewall dropping the DHCP ACK. Since I don't even see the initial traffic on the wire,
I do not think my problem is resolved by KB2459530.
Has anyone else seen problems like this? Any additional information would be helpful.
Thank you,
-nils