Thursday, October 12, 2017

Fixed: ACK/RST intermittently connecting to RPC/Kerberos/LDAP ports on a Server 2012 R2 Domain Controller

I worked an interesting case this week.  My customer had widespread domain controller outages on their Windows Server 2012R2 DCs.  A network trace showed that connections to the domain controllers were intermittently being refused with an ACK/RST (acknowledgement/reset) in response to the initial SYN (TCP Hello) packet.

It only happened under load.
It wasn't a firewall.
It wasn't the AV software.
It wasn't a laundry list of other things.

Here is the odd bit.  In a netsh trace on the DC, we could see the SYN packet was getting to the DC, but it wasn't getting to the RPC endpoint mapper, lsass, or netlogon.

Why?

That took the weekend and Monday to figure out.  What we discovered was there was a TDI filter driver in the networking stack that was, for cause unknown, slowing down network connections to the DC.

TDI "Transport Driver Interface" is an older Windows technology that lets software drivers hook between the TCP/IP driver stack and applications.




Further digging led us to find this KB article from Trend Micro that indicated a problem with the VmWare NSX Network Introspection Driver (vnetflt.sys) was causing the trouble.  This driver is installed with the VmWare Tools Package on the DCs, and it was several years out of date.

The solution from the KB worked.  We disabled that driver via the registry and the problem stopped immediately after a reboot.

Specifically, in the registry:
Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\vnetflt\
Key: Start
Type: REG_DWORD
Value: 4

Magic.

It's interesting to note that I haven't seen or heard of any other similar issues with this.  I assume it was an interaction between the ancient VmWare tools on the box and some of their security software or a Software Update, but I'm not sure.  They wanted to do that root cause analysis in their lab independently.  I asked them to cc me on the root cause, and I'll update this blog post if they do.

Lessons Learned:
  • You really should keep your VmWare tools updated.  
    • Like all drivers, a fire and forget mentality can seriously bite you.
  • You can convert a netsh network trace ETL file to a big text file with this command:
    • netsh trace convert input="input.etl" output="output.log" 
  • You can capture a netsh network trace ETL and add in extra ETL providers, like the TCPIP driver messages, with this command: 
    • netsh trace start capture=yes overwrite=yes maxsize=2048  tracefile=c:\output.etl scenario=netconnection provider={EB004A05-9B1A-11D4-9123-0050047759BC} keywords=0xffffffffffffffff level=0xff provider="Microsoft-Windows-TCPIP" keywords=0xffffffffffffffff level=0xff
  • The Potbelly Sandwich Shop in Normal, Illinois makes a fantastic grilled cheese sandwich.

No comments: