Need to reset arp cache on server 2008 after network lost (Network Steve Forum)

Need to reset arp cache on server 2008 after network lost

HI, I have a network problem with server 2008 that is really starting to be anoying. If one of our main router ( gateways) connection is lost, most of my Win 2008 servers can't ping ( reach) host on that network after. I need to do a reset of the ARP cache with NETSH to be able to ping those server again. any one knows what might be the problem. I search on the forum and found similar case to mine but the were mostly about NLB which i'm not using. This is specific to the win 2008 servers, all my 2003 server work just fine after a network lost from one of the gateways.

April 19th, 2011 1:36pm

Hi necodemus, Thanks for posting here. > If one of our main router ( gateways) connection is lost, most of my Win 2008 servers can't ping ( reach) host on that network after. Do you mean that you have also set multi default gateway entries for these servers ? If this issue only occurred on Windows server 2008 hosts ,I think you might try temporarily disabling built-in firewall feature for these hosts with following the procedure in the link below and see if this issue will persist when default gateway down: I Need to Disable Windows Firewall http://technet.microsoft.com/en-us/library/cc766337(WS.10).aspx Thanks. Tiger Li Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

Free Windows Admin Tool Kit Click here and download it now

April 20th, 2011 4:44am

firewall is disable on all servers. Let me explain my problem with an example. I have a main backup server in site A ( 172.17.40.32) an others backup server in diferrent site connect to that server. Let say that router 172.17.112.1 in site B goes down for 1 min or more. When the network comes back in site B, Backup server in site A will not be able to reach backup server in site B until I log to the backup server site A and delete the ARP cache table ( netsh interface ip delete arpcache) or reboot it. This is starting to be a problem because my main server that monitors event on the network is now a 2008 server and I get false data from network outage in the different site we have.

April 20th, 2011 10:16am

Hi necodemus, Thanks for update. So both sites are connected with same router device ? if not, how did you configure route? static or dynamic routing? Does this issue only occur on host 172.17.40.32 ? what about other hosts in same subnet ? Have you tried troubleshoot by using “tracert” utility when 172.17.112.1 router return and what’s the result? Some other general troubleshoot methods could be found form the article below: http://support.microsoft.com/kb/314067 Thanks. Tiger Li Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

Free Windows Admin Tool Kit Click here and download it now

April 21st, 2011 5:36am

we use Cisco for router, I wouldn't know the exact config since it's manage by our MPLS provider. It doesn't happen only on host 40.32. I have at least 2 others win 2008 servers in that subnet( 172.17.40.x) that can't talk to others servers /devices if a router on a distance site goes down and up. I did trace it once but I can't remember of the result. if I remember correctly, 40.32 couldn't go anywhere after 40.1 but backup server in site B( subnet 172.17.112.x) could ping 40.32 Next time I have the probleme I will trace it and let know of the result. one of the router reset at least once a week in one of the site.

April 21st, 2011 1:27pm

Hi necodemus, Thanks for update. Ok, please confirm with your MPLS provider first. I suspect that incorrect or misconfigurations may cause this issue. If any update please keep posting here and let us know. Thanks. Tiger Li Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

Free Windows Admin Tool Kit Click here and download it now

April 25th, 2011 3:54am

Hmm, I could contact the provider and probably will, but what I don't get is that it only apply to the Windows 2008 server and only some servers. I have a WSUS server which I don't have that problem and also we have 3 exchange server ( 2 in cluster) and they don't seems to be affected by that problem.

April 25th, 2011 9:26am

Ok , one of the router when down last night. this morning my VIcenter ( server 2008) could not reach the ESXi host where the router went down. did a tracert from the vicenter and got a time out at the first hop. Same thing from the ESXi host. reset arp cache on the vicenter and everything started to work again.

Free Windows Admin Tool Kit Click here and download it now

April 27th, 2011 10:18am

Ok, I finally had a chance to do more testing, so here is the trace result after a network lost: So from the site B to site A (main router) Tracing route to cass-ap-wug01.global.pfleiderer.lan [172.17.40.240] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms 172.17.112.6 2 19 ms 18 ms 18 ms 143.159.192.41 3 34 ms 34 ms 35 ms 172.17.4.2 4 * * * Request timed out. From Site A to B: C:\Users\Administrator>tracert usmo-fp-01 Tracing route to usmo-fp-01.global.pfleiderer.lan [172.17.113.65] over a maximum of 30 hops: 1 * * * Request timed out. this is how a full trace should look for both sites: Tracing route to cass-ap-wug01.global.pfleiderer.lan [172.17.40.240] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms 172.17.112.6 2 19 ms 18 ms 18 ms 143.159.192.41 3 34 ms 34 ms 35 ms 172.17.4.2 4 34 ms 33 ms 34 ms cass-ap-wug01.global.pfleiderer.lan [172.17.40.2 40] Trace complete. Tracing route to usmo-fp-01.global.pfleiderer.lan [172.17.113.65] over a maximum of 30 hops: 1 6 ms 5 ms 1 ms 172.17.40.1 2 <1 ms <1 ms <1 ms 172.17.4.1 3 1 ms 1 ms 1 ms 10.1.1.5 4 33 ms 33 ms 34 ms 143.159.192.42 5 33 ms 33 ms 35 ms 143.159.192.42 6 34 ms 34 ms 33 ms usmo-fp-01.global.pfleiderer.lan [172.17.113.65] Trace Complete I spent more time looking for a solution and wounder if HSRP from Cisco could ne the problem here.

May 30th, 2011 3:02pm

I did more testing and the problem seem to bee at my datacenter. I use wireshark to acquire data. if I ping one of the server in site B from a server in the datacenter with the problem, instead of fowarding the data to the gateway mac address, it foward the traffic to our Cisco PIX... I reset the arp cache using netsh and the ping are foward back to the gateway. this is getting weird.

Free Windows Admin Tool Kit Click here and download it now

June 6th, 2011 9:00am

Ok more weird stuff. I have a server in our datacenter that monitor important device and server on our network ( what's up gold). that monitoring software is installed on windows serveur 2008 and I had the issue that when a router in site X goes down, WUG ( what's up gold) still can't reach device after the netwok is back on that site. the only solution is to log on to the WUG server and do a netsh to delete the ARP cache table, when done it will start to reach device again. Now, last night the main router in our datacenter when up/down for I don't know maybe 15 to 30 sec. Typical network glitch that happen sometime and nothing that a network can't handle. The weird part is that WUG couldn't reach the main gateway (172.17.40.1) anymore and probably others server that I monitor on the same subnet. how can't it now reach a router that is on the same subnet as him. deleting ARP cache on the WUG server solve the issue and everything is back to normal until the router goes up/down again. Anyone has an idea ?

June 14th, 2011 12:05pm

I am having the same issue. Exactly. And can reproduce at will by disconnecting the interface that my MPLS traffic flows through. Who provides your site connectivity? Ours is by AboveNet. -stephen

Free Windows Admin Tool Kit Click here and download it now

June 15th, 2011 5:08pm

Finally someone with the same issue, now I know that i'm not crazy. My provider is Bell Canada. I'm not sure if the problem is specific to the MPLS. I think it's more a combination of factor ( The way windows 2008 now handle ARP request and Cisco router config or IOS options).

June 16th, 2011 10:13am

This topic is archived. No further replies will be accepted.