Need to reset arp cache on server 2008 after network lost
HI, I have a network problem with server 2008 that is really starting to be anoying. If one of our main router ( gateways) connection is lost, most of my Win 2008 servers can't ping ( reach) host on that network after. I need to do a reset of
the ARP cache with NETSH to be able to ping those server again.
any one knows what might be the problem. I search on the forum and found similar case to mine but the were mostly about NLB which i'm not using. This is specific to the win 2008 servers, all my 2003 server work just fine after a network lost from
one of the gateways.
April 19th, 2011 8:30pm
Hi necodemus,
Thanks for posting here.
> If one of our main router ( gateways) connection is lost, most of my Win 2008 servers can't ping ( reach) host on that network after.
Do you mean that you have also set multi default gateway entries for these servers ?
If this issue only occurred on Windows server 2008 hosts ,I think you might try temporarily disabling built-in firewall feature for these hosts with following the
procedure in the link below and see if this issue will persist when default gateway down:
I Need to Disable Windows Firewall
http://technet.microsoft.com/en-us/library/cc766337(WS.10).aspx
Thanks.
Tiger Li
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
April 20th, 2011 11:38am
firewall is disable on all servers.
Let me explain my problem with an example. I have a main backup server in site A ( 172.17.40.32) an others backup server in diferrent site connect to that server. Let say that router 172.17.112.1 in site B goes down for 1 min or more.
When the network comes back in site B, Backup server in site A will not be able to reach backup server in site B until I log to the backup server site A and delete the ARP cache table ( netsh interface ip delete arpcache)
or reboot it.
This is starting to be a problem because my main server that monitors event on the network is now a 2008 server and I get false data from network outage in the different site we have.
April 20th, 2011 5:12pm
Hi necodemus,
Thanks for update.
So both sites are connected with same router device ? if not, how did you configure route? static or dynamic routing?
Does this issue only occur on host 172.17.40.32 ?
what about other hosts in same subnet ?
Have you tried troubleshoot by using “tracert” utility when 172.17.112.1 router return and what’s the result? Some other general troubleshoot methods
could be found form the article below:
http://support.microsoft.com/kb/314067
Thanks.
Tiger Li
Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
April 21st, 2011 12:31pm
we use Cisco for router, I wouldn't know the exact config since it's manage by our MPLS provider. It doesn't happen only on host 40.32. I have at least 2 others win 2008 servers in that subnet( 172.17.40.x) that can't talk to
others servers /devices if a router on a distance site goes down and up.
I did trace it once but I can't remember of the result. if I remember correctly, 40.32 couldn't go anywhere after 40.1 but backup server in site B( subnet 172.17.112.x) could ping 40.32
Next time I have the probleme I will trace it and let know of the result. one of the router reset at least once a week in one of the site.
April 21st, 2011 8:18pm
Hi necodemus,
Thanks for update.
Ok, please confirm with your MPLS provider first. I suspect that incorrect or misconfigurations may cause this issue.
If any update please keep posting here and let us know.
Thanks.
Tiger Li
Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
April 25th, 2011 10:48am
Hmm, I could contact the provider and probably will, but what I don't get is that it only apply to the Windows 2008 server and only some servers. I have a WSUS server which I don't have that problem and also we have 3 exchange server ( 2 in cluster) and
they don't seems to be affected by that problem.
April 25th, 2011 4:20pm
Ok , one of the router when down last night. this morning my VIcenter ( server 2008) could not reach the ESXi host where the router went down. did a tracert from the vicenter and got a time out at the first hop. Same thing from the ESXi host. reset
arp cache on the vicenter and everything started to work again.
Free Windows Admin Tool Kit Click here and download it now
April 27th, 2011 5:12pm
Ok, I finally had a chance to do more testing, so here is the trace result after a network lost:
So from the site B to site A (main router)
Tracing route to cass-ap-wug01.global.pfleiderer.lan [172.17.40.240]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 172.17.112.6
2 19 ms 18 ms 18 ms 143.159.192.41
3 34 ms 34 ms 35 ms 172.17.4.2
4 * * * Request timed out.
From Site A to B:
C:\Users\Administrator>tracert usmo-fp-01
Tracing route to usmo-fp-01.global.pfleiderer.lan [172.17.113.65]
over a maximum of 30 hops:
1 * * * Request timed out.
this is how a full trace should look for both sites:
Tracing route to cass-ap-wug01.global.pfleiderer.lan [172.17.40.240]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 172.17.112.6
2 19 ms 18 ms 18 ms 143.159.192.41
3 34 ms 34 ms 35 ms 172.17.4.2
4 34 ms 33 ms 34 ms cass-ap-wug01.global.pfleiderer.lan [172.17.40.2
40]
Trace complete.
Tracing route to usmo-fp-01.global.pfleiderer.lan [172.17.113.65]
over a maximum of 30 hops:
1 6 ms 5 ms 1 ms 172.17.40.1
2 <1 ms <1 ms <1 ms 172.17.4.1
3 1 ms 1 ms 1 ms 10.1.1.5
4 33 ms 33 ms 34 ms 143.159.192.42
5 33 ms 33 ms 35 ms 143.159.192.42
6 34 ms 34 ms 33 ms usmo-fp-01.global.pfleiderer.lan [172.17.113.65]
Trace Complete
I spent more time looking for a solution and wounder if HSRP from Cisco could ne the problem here.
May 30th, 2011 9:55pm
I did more testing and the problem seem to bee at my datacenter. I use wireshark to acquire data. if I ping one of the server in site B from a server in the datacenter with the problem, instead of fowarding the data to the gateway mac
address, it foward the traffic to our Cisco PIX... I reset the arp cache using netsh and the ping are foward back to the gateway.
this is getting weird.
Free Windows Admin Tool Kit Click here and download it now
June 6th, 2011 3:52pm
Ok more weird stuff. I have a server in our datacenter that monitor important device and server on our network ( what's up gold). that monitoring software is installed on windows serveur 2008 and I had the issue that when a router in site X goes down,
WUG ( what's up gold) still can't reach device after the netwok is back on that site. the only solution is to log on to the WUG server and do a netsh to delete the ARP cache table, when done it will start to reach device again.
Now, last night the main router in our datacenter when up/down for I don't know maybe 15 to 30 sec. Typical network glitch that happen sometime and nothing that a network can't handle. The weird part is that WUG couldn't reach the main gateway
(172.17.40.1) anymore and probably others server that I monitor on the same subnet. how can't it now reach a router that is on the same subnet as him. deleting ARP cache on the WUG server solve the issue and everything is back to
normal until the router goes up/down again.
Anyone has an idea ?
June 14th, 2011 4:58pm
I am having the same issue. Exactly. And can reproduce at will by disconnecting the interface that my MPLS traffic flows through. Who provides your site connectivity? Ours is by AboveNet.
-stephen
Free Windows Admin Tool Kit Click here and download it now
June 16th, 2011 12:02am
Finally someone with the same issue, now I know that i'm not crazy. My provider is Bell Canada. I'm not sure if the problem is specific to the MPLS. I think it's more a combination of factor ( The way windows 2008 now handle ARP request and Cisco
router config or IOS options).
June 16th, 2011 5:07pm