Exchange 2013 CU7 server performance/outage issues.

Hi Forum,

We are constantly faced with incidents from users that the connections are lost with the exchange server. As a result, we hired a consultant to install an new Exchange 2013 environment based on the Microsoft, VM-Ware and NetApp best practices.

We are still having problems with performance issues. From the client prospective, the performance has not changed, weekly hangs are still happening and Im at my wits end.

This is our configuration:

2 Windows Server 2012 R2 CAS and DB

VM with 4 vCpu, 16GB, 1 Vmxnet3, IPv6 disabled (the Microsoft way)

VM Ware 5.1 U1

Cluster without AAP

DB1 active on Server 1 (Datacenter 1 with 3 host cluster)

DB2 active on Server 2 (Datacenter 2 with 3 host cluster)

Veeam 8 Backup (Move-ActiveMailboxDatabase DB2 -ActivateOnServer Server1 , backup server 2 then Move-ActiveMailboxDatabase back to  Server2.

Exchange is in Online-Mode due to Citrix XD VDI clients.

1300 Mailboxes and 750 users.

NetApp with SATA disks. (1.5TB E: Vol for DB1/1.5TB F: Vol for DB2)

No archiving

Unlimited mailbox sizes.

The problems are:

-       If we migrate a non-Exchange VM in the same cluster or to\from the same host, this results in a 30 second to 5 min Outlook outage;

-       If we make both DBs active on mail server 1 and reboot server 2, this results in a 30 second to 5 min Outlook outage;

-       If we make both DBs active on mail server 1 and do a Veeam backup of server 2, sometimes one of the DBs go back to server 1, on its own;

-       While monitoring we see that the w3wp.exe and the Microsoft.Exchange.Store.worker.exe are consuming most of the cpu and memory;

What can I do to solve the outages?

Tnx. Timotatty.

Exchange server performance issues.

Hi Forum,

We are constantly faced with incidents from users that the connections are lost with the exchange server. As a result, we hired a consultant to install an new Exchange 2013 environment based on the Microsoft, VM-Ware and NetApp best practices.

We are still having problems with performance issues. From the client prospective, the performance has not changed, weekly hangs are still happening and Im at my wits end.

This is our configuration:

2 Windows Server 2012 R2 CAS and DB

VM with 4 vCpu, 16GB, 1 Vmxnet3, IPv6 disabled (the Microsoft way)

VM Ware 5.1 U1

Cluster without AAP

DB1 active on Server 1 (Datacenter 1 with 3 host cluster)

DB2 active on Server 2 (Datacenter 2 with 3 host cluster)

Veeam 8 Backup (Move-ActiveMailboxDatabase DB2 -ActivateOnServer Server1 , backup server 2 then Move-ActiveMailboxDatabase back to  Server2.

Exchange is in Online-Mode due to Citrix XD VDI clients.

1300 Mailboxes and 750 users.

NetApp with SATA disks. (1.5TB E: Vol for DB1/1.5TB F: Vol for DB2)

No archiving

Unlimited mailbox sizes.

The problems are:

-       If we migrate a non-Exchange VM in the same cluster or to\from the same host, this results in a 30 second to 5 min Outlook outage;

-       If we make both DBs active on mail server 1 and reboot server 2, this results in a 30 second to 5 min Outlook outage;

-       If we make both DBs active on mail server 1 and do a Veeam backup of server 2, sometimes one of the DBs go back to server 1, on its own;

-       While monitoring we see that the w3wp.exe and the Microsoft.Exchange.Store.worker.exe are consuming most of the cpu and memory;

What can I do to solve the outages?

Tnx. Timotatty.

March 10th, 2015 12:18pm

I'm impressed that you've included lots of information but I'm afraid to say that you're asking for a lot in a forum like this.  Diagnosing this kind of problem could take hours when one has access to the server, and trying to tell you what's wrong from a post like this would take a lot of time itself and would be nothing but a complete guess.  Sorry.
Free Windows Admin Tool Kit Click here and download it now
March 11th, 2015 1:23am

Hi,

Firstly, you need to find the bottleneck. In Exchange 2013, you can check:

https://technet.microsoft.com/en-us/library/jj150524(v=exchg.150).aspx

Thanks,

Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

March 11th, 2015 4:52am

Thanks anyway Ed.

This is wat I see almost daily but definately weekly:

Free Windows Admin Tool Kit Click here and download it now
March 11th, 2015 5:40am

Thanks Simon,

some of these have been done already but I will try them all later today.

Tim

March 11th, 2015 5:42am

Hi Tim,

Any progress on this issue?


Thanks,

Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

Free Windows Admin Tool Kit Click here and download it now
March 15th, 2015 10:41pm

Hi,

I had the same issue with a client. I managed to fix the issue by patching the Domain Controllers <g class="gr_ gr_244 gr-alert gr_spell ContextualSpelling ins-del multiReplace" data-gr-id="244" id="244">tl</g> the <g class="gr_ gr_257 gr-alert gr_spell ContextualSpelling" data-gr-id="257" id="257"><g class="gr_ gr_258 gr-alert gr_gramm Grammar multiReplace" data-gr-id="258" id="258">lastest</g></g> and greatest and then upgrading the firmware on the NIC (this will differ from your <g class="gr_ gr_157 gr-alert gr_gramm Punctuation only-ins replaceWithoutSep" data-gr-id="157" id="157">configuration</g> of course). Though give the DCs ago. I have also seen people mentioning tweaking their Max Concurrent API setting but bear caution to this and make sure you do it correctly. 

I hope that helps!

March 16th, 2015 11:24am

Hi Ray,

Patching the DC's seem to be a bit out there for an exchange issue but you never know. Since we are still in the prcess of updating/migrating from Windows Server 2003 to 2012 and raising the domain function level, you might have a point. We have mixed DC's but all roles are on the legacy 2003 PDC server.

Since my post we have had additional problems:

1. Veeam backup 8 freezing the host to commit the snapshot = downtime;

2. Veeam backups taking 32 hours to complete which runs into production time;

3. Kemp loadbalancer marking the server as unavailable if 1 service such as ECP has been flagged by Manage Availability as being offline;

Last weekend I have figured out how to put an Exchange server in maintenance mode to do the backups. That helped but then the inactive node was running at 100% cpu/memory :( So today we tried to resolve the above 3 points and this evening we will further segregate the Exchange VM's by assigning them to a host, 1 server per host with no other virtual machines assigned to that host. They are already on different clusters so hopefully this will shed some light on the performance issues.

More on this tomorrow.

Tim

Free Windows Admin Tool Kit Click here and download it now
March 16th, 2015 1:10pm

Hi Tim,

Have your tried the performance monitor and check the counters below?

https://technet.microsoft.com/en-us/library/dd335215(v=exchg.141).aspx

Thanks,

Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

 

March 17th, 2015 7:32am

Hi,

Is there any update on this thread?


Thanks,
Free Windows Admin Tool Kit Click here and download it now
March 24th, 2015 9:43am

Hi Simon,<o:p></o:p>

Apparently we are using basic authentication which result in a FailingCode=401 as seen in the Event Log under Active Monitoring --> Probe Results from ECP and others. This became apparent after reading this blog: http://blogs.technet.com/b/ehlro/archive/2014/02/20/exchange-2013-managed-availability-healthset-troubleshooting.aspx<o:p></o:p>

We are now overriding some of the monitors which require Forms Based Authentication.<o:p></o:p>

Regarding the other 2 issues we have changed the licensing model for Veeam 8 to allow full throughput which will reduce the backup time and add compression.

The KEMP support team helped us by deselecting
Use HTTP/1.1 under View/Modify Services --> ECP > Modify --> Real Servers. This now only flags a service (ECP or ActiveSync or OWA) as being down instead of an etire server should one component fail.<o:p></o:p>

I am satisfied but still not happy with the steps required to troubleshoot an Unhealthy Health Sets:<o:p></o:p>

Invoke-MonitoringProbe always returns with: WARNING: Could not find assembly or object type associated with monitor identity '<Healthe Set >\< Probe >'. Please ensure that the given monitor identity exists on the server.

This makes it very difficult to troubleshoot Unhealthy Health Sets.

Regards,

Tim

April 7th, 2015 8:51am

LDAP is a very common issue. Side effect is Outlook slow performance. 1 DC core processor for ever 8 Exchange core processors is the guidance. This assumes 'dedicated' 64 - bit OS DC's.

Always troubleshoot with the load balancers out of the way, that helps to see if the issue is Exchange related or LB related.

Use EXPERFWIZ and run the results through PAL to quickly see issues.

http://experfwiz.codeplex.com/

http://pal.codeplex.com/

Also, ensure your 'sleepy NIC' is not set: http://blogs.technet.com/b/exchange/archive/2013/10/22/do-you-have-a-sleepy-nic.aspx Most likely not the issue, but eliminate easy possible issues up front.

Free Windows Admin Tool Kit Click here and download it now
April 7th, 2015 11:52pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics