Sunday 1 March 2015

Hyper-v - Virtual machines randomly losing their network connections.

I don't normally write on IT issues in my blog as that's far too much like work but I decided that I would make an exception in this case as several clients have employed us to resolve these issue of late and I felt it could be helpful.

I have based the write up on a typical config we have seen this issue with.  A brand new cluster of four HP DL360 Gen8 using Microsoft Hyper-v Server 2012 R2 and SMB storage on a DL380 Gen8 running Windows Server 2012 R2

The Issue

Virtual machines suddenly lose network connectivity, normally just one at a time.  Moving the machine to another node in the cluster will fix the issue and they will continue to work even when moved back until the next time it happens.  At first an issue with the physical switch was suspected but quickly disproved.  We then looked at the hyper-v virtual switch but once again that seemed petty easy to discount.  We were also briefly sent down a dead end of a MAC address clash.

The Cause

After a lot of research we finally go to the bottom of the issue. It would appear there is an issue with various network adapters in particular those using a Broadcom chip and Microsoft's implementation of VMQ.  The issue mainly seems to come to light if you are using NIC teaming but we have seen it without.  The odd thing is that there is a lack of consistency in what seems to cause it, although having everything patched to the latest levels seems to help.

The Solution?

Whilst there are solutions touted one the web such as installing KB2887595-v2 we have found them all at best a bit flakey, in some cases they seem to introduce a real performance hit in others they just don't work at all. For now we are relying on a simple work around.

The Work Around

The work around (I wont call this a fix as involves disabling a useful feature) is to disable VMQ on the physical network adapters on your Hyper-v hosts.  The simplest way of doing this is with the following Powershell where ethernet1 is replaced with the name of the network adapter in question.

Disable-NetAdapterVmq –Name ethernet1

You will also need to make sure that Enable Virtual Machine Queue is unticked in each guest VM's settings.

Going Forward

Hopefully there will soon be a long term solution to the issue.  In the meantime we are no longer specifying the HP NC365T adapters.  There does seem to be a few more options out there but for now we will be sticking with disabling VMQ.

No comments:

Post a Comment