Showing posts with label Networking. Show all posts
Showing posts with label Networking. Show all posts

Monday, April 28, 2008

Ran into an interesting Windows Server 2003 problem today. There was very high CPU utilization by lsass.exe and svchosts.exe (NETWORK SERVICE), and an inability to create outbound network connections. The client has previously rebooted this server to resolve the issue in the past, but wanted an explanation.

This client runs our application from that server using Terminal Services. Our application would not run and was given a cryptic message with the error code 10055. That five-digit code looked like a Microsoft TCP error to me, and sure enough it was. A quick Google searched turned this up:

10055 WSAENOBUFS -- No buffer space available.
Since that is a TCP error, that would mean no network resources are available. I ran "netstat -n" but saw very few established sockets.

Task Manager showed high CPU usage by lsass.exe and one svchosts.exe running as NETWORK SERVICE. I tried to download Process Explorer but Internet Explorer could not get to any websites; the bottom of the page said there was a DNS error.

I opened a command prompt (CMD) and was able to ping www.yahoo.com, so DNS works. I figured I would just FTP the file, so I ran FTP from their server to our web site and got another error:
> ftp: connect :No buffer space is supported
So, another error that points to TCP resources being unavailable.

I tried asking the great Google for answers about TCP resources and error 10055 but mostly found people who rebooted to make the problem go away. There were some Microsoft Articles about increasing the maximum TCP/IP socket buffers, but this is not our server, so I do not want to make changes requiring a reboot without knowing if it would even solve the problem.

Naturally, I examined Event Viewer and saw some error messages that suggested more socket errors which Microsoft's KB indicates a group policy not being able to execute. Probably not the root problem.

I decided to figure out which service was killing the CPU and see if it was also tying up the network resources.

LSASS sustains a bunch of services, including HTTP SSL, IPSEC, Kerberos, NetLogon, NT LM, Protected Storage, Security Accounts Manager, and maybe a couple others. It seems to manage TCP sockets rather than use them, so though CPU is high I figured I could safely ignore LSASS.EXE

OK, svchosts.exe is starts up services, so I ran the Services.MMC (or go to Control Panel > Administrative Tools > Services) to examine services. I went through all the listed services and looked at the details for each. Where the command line included svchosts.exe, I looked for the "-k NETWORK" to determine which svchosts.exe service was running as NETWORK SERVICE. I restarted each one and watched Task Manager to see if the high-CPU instance of svchosts.exe disappeared briefly. When I got to the "Server" service, both svchost.exe and lsass.exe freed up their resources. The Server service also restarted Net Logon, DFS, and Computer Browser.

Ta-dah! I was able to browse the web, FTP, and of course our application worked again. Since this was not our Windows Server 2003 machine, I passed the information along. But, that certainly beats rebooting completely.

Sunday, August 05, 2007

Vista Wireless DHCP Problems with SonicWall TZ170w

The title bar link will not work for you if you do not have a SonicWall forum login. Nonetheless, here's an interesting problem I ran into and the solution.

I've been running Vista with various wireless access points just fine; Vista's WiFi stack seems OK to me. Then, we had a client running Vista who could not connect wirelessly to the firewall/access point we sold to them. That was a problem.

The Problem:
This Vista laptop was not able to obtain an IP address via DHCP from the SonicWall TZ 170w. It was able to associate itself (it showed up in the mac/ARP table) to the access point, but instead of getting an IP adress it kept reporting an IP address conflict. The same Vista laptop acquired a DHCP address just fine when connected through wired Ethernet.

Other symptoms included the laptop mac address showing up multiple times in the DHCP lease table on the SonicWall, the event viewer recording DHCP errors on differing IP addresses all reporting conflicts, and finally the wireless NIC falling back to an automatically assigned private IP address. I think that, if our DHCP pool has been small, this single laptop would have used up every available IP address in the DHCP pool.

The Solution:
According to Microsoft, a network trace revealed that Vista client is doing gratuitous ARP while losing the IP.

One of the usages of ARP is to provide duplicate IP address detection through the transmission of ARP Requests known as gratuitous ARPs. A gratuitous ARP is an ARP Request for a node’s own IP address. In the gratuitous ARP, the SPA and the TPA are set to the same IP address.

If a node sends an ARP Request for its own IP address and no ARP Reply frames are received, the node can assume that its assigned IP address isn’t being used by other nodes. If a node sends an ARP Request for its own IP address and an ARP Reply frame is received, the node can determine that its assigned IP address is already being used by another node.

After obtaining an IP address from the SonicWall TZ 170w firewall, the Vista client issues an auto-ARP to assure no conflict; in doing so it expects an answer from the DHCP server confirming the IP address. Without the confirmation the client will decline the IP address received via DHCP.

The core of the issue seems to be the ARP request sent out by the Vista client. The wireless Vista client issues out a Version A ARP request with a source IP address of 0.0.0.0. This is non-standard behavior (whatever that means). Future SonicWall TZ 170w firmware will address the issue.

The Resolution:
The ArpRetryCount registry setting sets the number of times that a gratuitous ARP is sent when initializing IP for a specific IP address. If no ARP Reply is received after sending ArpRetryCount gratuitous ARPs, IP assumes the IP address is unique on the network segment.

In the mean time, runas REGEDIT as administrator, then go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters and add a new REG_DWORD named “ArpRetryCount” with a value of 0 and reboot.

Again, according to Jean-Marc of SonicWall (as of May 2007), a future firmware will address the issue.