Monday, April 28, 2008

Ran into an interesting Windows Server 2003 problem today. There was very high CPU utilization by lsass.exe and svchosts.exe (NETWORK SERVICE), and an inability to create outbound network connections. The client has previously rebooted this server to resolve the issue in the past, but wanted an explanation.

This client runs our application from that server using Terminal Services. Our application would not run and was given a cryptic message with the error code 10055. That five-digit code looked like a Microsoft TCP error to me, and sure enough it was. A quick Google searched turned this up:
10055 WSAENOBUFS -- No buffer space available.
Since that is a TCP error, that would mean no network resources are available. I ran "netstat -n" but saw very few established sockets.

Task Manager showed high CPU usage by lsass.exe and one svchosts.exe running as NETWORK SERVICE. I tried to download Process Explorer but Internet Explorer could not get to any websites; the bottom of the page said there was a DNS error.

I opened a command prompt (CMD) and was able to ping www.yahoo.com, so DNS works. I figured I would just FTP the file, so I ran FTP from their server to our web site and got another error:
> ftp: connect :No buffer space is supported
So, another error that points to TCP resources being unavailable.

I tried asking the great Google for answers about TCP resources and error 10055 but mostly found people who rebooted to make the problem go away. There were some Microsoft Articles about increasing the maximum TCP/IP socket buffers, but this is not our server, so I do not want to make changes requiring a reboot without knowing if it would even solve the problem.

Naturally, I examined Event Viewer and saw some error messages that suggested more socket errors which Microsoft's KB indicates a group policy not being able to execute. Probably not the root problem.

I decided to figure out which service was killing the CPU and see if it was also tying up the network resources.

LSASS sustains a bunch of services, including HTTP SSL, IPSEC, Kerberos, NetLogon, NT LM, Protected Storage, Security Accounts Manager, and maybe a couple others. It seems to manage TCP sockets rather than use them, so though CPU is high I figured I could safely ignore LSASS.EXE

OK, svchosts.exe is starts up services, so I ran the Services.MMC (or go to Control Panel > Administrative Tools > Services) to examine services. I went through all the listed services and looked at the details for each. Where the command line included svchosts.exe, I looked for the "-k NETWORK" to determine which svchosts.exe service was running as NETWORK SERVICE. I restarted each one and watched Task Manager to see if the high-CPU instance of svchosts.exe disappeared briefly. When I got to the "Server" service, both svchost.exe and lsass.exe freed up their resources. The Server service also restarted Net Logon, DFS, and Computer Browser.

Ta-dah! I was able to browse the web, FTP, and of course our application worked again. Since this was not our Windows Server 2003 machine, I passed the information along. But, that certainly beats rebooting completely.

1 comment:

  1. Thanks for this, gave me a few ideas for a similar, albeit unrelated, problem.

    ReplyDelete