Dynamic Memory is not working all the time

We are in the process off moving our 2008R2 VM's from the 2008R2 HyperV servers to new Server 2012R2 Hosts.

We shut down the VM's copy the files and VHD's to the new CSV's en import the VM in the Hyperv Manager. Then we make them high available in the Failover Cluster Manager (Configure role - Virtual machine). We mount the integration tools and update the VM to version 6.3.9600.16384

For a specific type of VM (mostly RDS Host servers) we always had Dynamic Memory configured (when they were hosted on de 2008R2 platform), so we are using the same settings on the 2012r2 platform. The memory settings were;

Startup memory: 1024 MB

Minimum memory: 1024 MB

Maximum memory: 12288 MB

These VM's reboot every morning, this is done for specific reasons. But now once in a while (once per week/2 weeks) we notice that the VM's are not using more memory then 1024 MB while the demand is much higher. Rebooting the server helps most of the times, live migrating to another host also helps. In the VM we see that memory usage in the taskmanager is 99-100%, and after the move it immediately starts using more than the minimum configured amount.

Until the failover the memory usage was 1024 MB and it did not get any higher.

This happened several times. Last week we changed the Memory configuration to:

Startup memory : 2048 MB

Minimum memory: 2048 MB

Maximum memory: 12288 MB

But this morning we had a call about the performance of one of the VM's, We saw that it was only using 2 GB memory while the demand was much higher. After live migrating it to another host it started using more memory immediately.

The 2012R2 hosts are not overcommited, there is a lot of memory still available for the VM's. Those VM's never had this problem on the 2008R2 Hyperv platform.

Any idea why this happens?

Peter Camps


  • Edited by Peter Camps Monday, November 24, 2014 12:13 PM
November 24th, 2014 11:59am

As you have shifted these machines over to Server 2012, have you been updating the Integration Services? It almost seems to me the server never comes out of the start up state... Also for another test lets try and do start-up memory of 2048 and minimum memory of 1024. I'd be interested to see if the memory falls back to 1024 after the initial start-up.

In the Hyper-V Application logs on the host in the windows event viewer do you see any errors with dynamic memory?

In the guest go to perfmon and view these performance counters if available.

  • Hyper-V Dynamic Memory > Guest Visible Memory
  • Hyper-V Dynamic Memory > Physical Memory

Also, in the hyper-v console does it show memory status as OK and what is the assigned memory vs memory demand.

Free Windows Admin Tool Kit Click here and download it now
November 24th, 2014 10:17pm

Hi,

Yes, all the VM's were upgraded with the latest Integration Tools (6.3.9600.16384).

On the hosts i don't see any error in the eventlog concerning Dynamic Memory, in fact the latest VM that had this issue rebooted at 04.00 yesterday morning and in the eventlog Hyper-V of the host i see the following Informational message;

The 'Microsoft Dynamic Memory Controller' device in 'VM' virtual machine is loaded and the protocol version is negotiated to the most recent version. (Virtual machine ID 330DF84E-118D-44A3-B57F-F27234E1A513)

All the other events are informational and no error shows in the eventlog.

The performance counters you mentioned are only available on the host, not in the guest. Before i can give you any information about the memory status and these counters i have to wait until this problem occurs again.

I have changed the memory settings of a few VM's to the values you asked.

Now all we have to do is wait.......

Grtx

Peter

November 25th, 2014 11:59am

Doh, you are right that is a host side monitor. I forgot about that.

Yeah, I think with those settings and some of the information in hand you should be able to catch the issue happening and possibly a log of it.

Free Windows Admin Tool Kit Click here and download it now
November 25th, 2014 4:05pm

Are you allowing the virtual machines to span NUMA nodes?
November 25th, 2014 11:49pm

Yes,

NUMA node spanning is enabled.

No problem today (so far)

Free Windows Admin Tool Kit Click here and download it now
November 26th, 2014 1:46pm

Hi Peter,

I'm sorry to say but provided screenshot doesn't match with your description.

Based on my test , it seems that the VM's memory drop down after you LM it to another node .It is more like an out-of-memory issue .

I would suggest you to check the memory usage in that VM .

Best Regards

Elton Ji

November 27th, 2014 1:06am

Hi Elton Ji,

The screenshot shows the following, the customer called us because everything was very slow and work was almost impossible. One of our servicedesk people managed to logon and saw in the taskmanager that there was only 1 GB of RAM available and (off course) completely used.

The high 100% line in the taskmanager was at the time the memory usage was 1 GB. Demand was much higher and no more RAM was given.

At that time i decided to move the VM to another host, and as if something was triggered the VM was assigned more RAM. The line dropped but also the scale off the taskmanager changed. The usage was still 1 GB (and climbing), but the scale now changed to 0-10 GB.

So the usage off RAM in the VM did not change but the scale of the histogram changed. First is was 0-1 GB (0=0%, 1GB=100%) to 0-10 GB (0=0%, 10 GB=100%).

I hope you understand my explanation.

Grtx

Peter

Free Windows Admin Tool Kit Click here and download it now
November 27th, 2014 9:38am

Hi Peter,

>>We mount the integration tools and update the VM to version 6.3.9600.16384

As far as I know , the newest Integration services is newer than 6.3.9600.16384 , please try to update the hyper-v host then update the integration services to check the result .

Best Regards

Elton Ji

December 2nd, 2014 8:37am

Hi Elton,

I recently (last Monday) updated a 4-node 2012R2 cluster with all applicable updates. The integration services version i mentioned earlier is still the current one on that cluster. So i have not been able to find a newer version off these integration services.

At the moment i am updating another 4-node 2012R cluster (where some of the problem VM's are placed), to see if this can fix the problem (including big update KB3000850). Since i posted this problem here it has not occured anymore, so have to wait and see what is happening.

Thanks so far

Grtx Peter

Free Windows Admin Tool Kit Click here and download it now
December 3rd, 2014 12:58pm

Hi,

It has taken quite a while, but this morning we had one RDS server that was only assigned 2 GB RAM (the startup memory amount), while it was configured to a maximum of 10 GB RAM it did not use more than 2.

At that moment it said in my hyperv manager the following;

Memory Status and Demand were blank. The memory usage in the VM was 100% for more than one hour, i started Perfmon on that Host and viewed the performance counters you mentioned before.

As you can see, all values are 2 GB, the amount that was assigned to the VM. There were no events in the eventviewer on the Host and in the VM itself. I live-migrated the VM to another Host in that cluster and immediately the values changed as you can see down here (the screenshots were made 30 minutes after the live migration).

This cluster is completely up to date with all Windows Updates en HP SPP (firmware and drivers). I am unable to reproduces the issue, but it still happens once in a while.

Any further ideas are appreciated.

Grtx

Peter Camps

  • Edited by Peter Camps Monday, January 05, 2015 11:19 AM
January 5th, 2015 10:46am

Peter,

I think this is a bug of some sort. I say that because the components that make up dynamic memory are as follows.

Memory Balancer(Host service, coordinates how memory changes are made.) This is also what shows the memory demand counter i believe.

Dynamic Memory Virtualization Service Provider (this is included your VMWP.exe proccess, one per VM. Essentially how it runs on the host. He listens to the Service Client for metrics)

Dynamic Memory Virtualization Service Client (this is inside the VM and reports to the Dynamic Memory Virtualizaton Service Provider.)

Since you live migrated the machine it made dynamic memory work on the other host. This means the Service Client is running in the client and shouldn't be an issue. The Memory Balancer is the server and shouldn't be the issue, so this means the "Dynamic Memory Virtualization Service Provider" is in question. When you live migrate the machine its going to create a new VMWP.exe process on the clustered server. So now the question is it the host that couldn't listen to the service or the worker process skipped a beat and has a bug.

Out of curiosity does it happen to both hosts? Also have you profiled the servers to see how much memory they really require on start-up? When you reboot the RDS servers, how many VM's do you reboot and is it a staggered process?

Free Windows Admin Tool Kit Click here and download it now
January 5th, 2015 4:35pm

Hello,

We have seen it happen to several different VM's (2008R2 OS in those VM's) and on several different Hosts. Even on different Clusters (all Server 2012R2 Hosts).

Edit (19-05-2015) **** We have also seen the problem happen in 2012R2 VM's ****

Because of this issue we have updated all our clusters, but did does not seem to have resolved it. Some of the VM's we migrated from our old 2008R2 clusters and some were newly installed on the 2012R2 cluster.

Most of the hosts have 2 to 3 VM's that reboot in the morning, Some reboot at 04:00 AM, others a bit later. The hosts normally have 25 to 50 GB RAM available when all the VM's are running, so there is plenty off avaliable RAM.

We haven't profiled the servers to see what the server requires at startup. Mainly because we have never seen this issue on our 2008R2 cluster (with the same settings), and it only happens occasionally.


January 6th, 2015 1:18pm

I'm curious if there is a lag or timing thing because of ram constraints and the amount of services on start-up... To test a theory could you build a VM and maybe script some of these processes with a simple batch file to launch at system startup? I'm thinking for memory testing start with the start-up amount or slightly greater of the startup memory value. See if you can get a consistent result of it breaking while you reboot. If yes i think it would be a bug.

Sorry for the shot in the dark. If I recall when the issue is happening all the hyper-v services are started correct?

Free Windows Admin Tool Kit Click here and download it now
January 6th, 2015 4:32pm

Hi,

I have the exact same issue with my setup. Is this a confirmed Bug or are there any solution? 

May 15th, 2015 6:14am

Hello,

We have opened a case with Microsoft concerning this issue. We have also discovered that it also happens with 2012R2 VM's, at first we thought it only happened with 2008R2 VM's but this is not true.

At this moment we are collecting information, performance logging, we have done several MSDT inventory on VM's and hosts. So far there is not yet a solution but i hope to have some positive news the next few weeks.

As soon as i have a solution or important facts about this issue i will post it here. But at this moment i think this will take a while.

If somebody as has any news or information, please share it with us. It has been a problem for quite a while now, and we would like to have this resolved.

Grtx

Peter Camps



Free Windows Admin Tool Kit Click here and download it now
May 19th, 2015 7:14am

This is looking to be a bug.

The MS product group are currently investigating... It is a timing issue around the Dynamic Memory driver on boot of the VM. We have sent them a couple of crash dumps, some ETL tracing and a copy of one of our VMs!

They are currently testing multiple reboots in the lab environments. Luckily a couple more people have raised cases recently about it so its getting some traction.

We have raised as SEV B so are in conversation daily regarding the issue. As soon as we have a fix i'll post more info.

@ Peter Camps - Could be a good idea for us to share case ID's? sounds like we are quite a few steps ahead of you in the PSS case?

May 20th, 2015 11:40am

Hello MS Jim,

Great to hear from you, our case ID is REG:115050712710411. Thus far i have not ben able to reproduce it myself by rebooting our testserver we created. So have to wait until it happens again on one off the production servers.

@ Microsoft Mike - If you could give me your ID, than i can inform our support contact at Microsoft and let him know about your case.

Grtx

Peter Camps

Free Windows Admin Tool Kit Click here and download it now
May 21st, 2015 9:12am

Hello,

Yesterday we heard from our Microsoft support contact that this case (and many more) have been elevated to a higher level. It is recognized as a bug and a specialized team is working on reproducing the problem and finding a definite solution.

Hope to hear soon from them.

May 22nd, 2015 3:27am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics