High Print Server Utilization and having troubles with Microsoft Support

We have a print cluster that hosts approx. 180 printers for about 2200 client workstations.  In order to make printing useful I had to assign these servers 8-cores each, for a total of 24Ghz on each server. On the server that has the print service active, it will sit at 100% utilization during on-hours.  Even during the late night/early morning period when there are no users in the distirct it will still consume 7Ghz+ of CPU constantly.  The print frequency on the servers is low.  If Id have to estimate its around 10-15 jobs / minute; nothing huge either.  In order to make printing somewhat useful I had to assign these servers 8-cores each. Users are experiencing slow printing, having to print larger jobs during off hours and its causing our VM environment to become completely unbalanced.

I have a case open with Microsoft on this.  I can provide the case number if needed.  However after weeks of them debugging they are saying this type of utilization is normal.  There is no way that this is normal.  Especially since last year we had about 100 printers on a single server, with other roles installed on it as well, and it never went above 25% on a single core.  Ive researched server scalability and we shouldnt even be close to the amount of resources these servers are using.

If anybody could help me out with this Id greatly appreciate it.  I know that there are some internal Microsoft guys around here that might be able to throw some ideas.  Really anything at this point would be helpful.

Thanks a bunch-

Rob

September 26th, 2013 5:33pm

If you have any WSD Ports installed on the cluster get rid of them.  WSD is the first path when creating a network port to the device but the underlying transport is not the best in cluster environments.  Most new printers support WSD so if one does not specifically select TCP/IP Device from the drop down list you can wind up with a WSD port. 

One other issue is with older version of HP's Universal print drivers.  HP did address the problem but I know they have been fine tuning how to have better performance with the driver in a clustered spooler environment.

WSD ports can only be deleted when deleting a printer.  I typically create a Standard TCP/IP Port to the device on the ports tab of the shared printer thus disassociating the WSD port from the share.  Then I add a fake printer using the WSD Port.  Then I delete the fake printer which also removes the WSD Port.

Free Windows Admin Tool Kit Click here and download it now
September 27th, 2013 9:00pm

Hi Alan-

  Thanks for the suggestions.  We dont have any WDS ports and we are only a revision or so back from the latest HP UPD.  The majority of our printers are Kyoceras, which we are on the latest.  We also have Konica Minolta copiers which are on the latest as well.

  I have pointed this out in the beginning of the case, but the client traffic seems to be the problem.  On 180 printers with approx 2200 and low print utilization the server still sees about 300-400 Mbps constant throughout the day.

  After I posted this message MS support came back saying that the spooler service has about 1200 threads open and the issue is on the clients - many print open/print close packets.  Even with a single printer we can see that a client machine may have 10-15 connections open to the print service on the server.  We are running some procmon and xperf data gathering on the clients to attempt to determine what is causing all the client requests to be initiated.

Id appreciate any more help you could provide.

Thanks-

Rob

September 30th, 2013 9:36pm

Hi

Can you isolate wich computer open the most ports (like the top 10), and after isolate what printer it got installed for those user. I'am suspecting a bad driver that keep the port open or badly communicate, but for over 180 printers it might be hell to find wich.

Free Windows Admin Tool Kit Click here and download it now
October 1st, 2013 6:58am

I have tried to narrow that down already.  I tried installing only one printer at a time of each of the three vendors we use.  Each one resulted in the basically the same amount of connections - roughly 6 (+/-1).

Now there are systems that I have seen with 15+ open connections. Many systems have 5+ printers installed on them, but the connections do not seem to be completely cumulative.  I.e. Five printers will not result in 30 connections.

Also, regarding the traffic sent to the print server.  I have done packet captures from the client to the server and the clients are sending approx 100 packets per seconds to the print server, even while sitting idle and with no jobs being sent.  The traffic appears to be those printer open/printer close requests that MS support has suggested.  I think this along with the open connections is the root cause, but Im unsure of how to pinpoint it further to what is causing it. 

-Rob

October 1st, 2013 4:38pm

Id appreciate any other suggestions anybody may have.  The case has been taken over by a different engineer out of the blue and I feel this case is going backwards and he is repeating items that were already previously.

-Rob

Free Windows Admin Tool Kit Click here and download it now
October 7th, 2013 4:02pm

Hi

What port are opened when you monitored ? Can we see a filtered wireshark ?

Edited; please check that too;

Paging of the Executive

No

HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management DisablePagingExecutive=dword:00000001

In order to increase performance, kernel mode drivers and other system components can be configured to that they are not paged to disk. However, there must be enough memory available to hold these items or else the system will experience performance and stability issues.

and The Windows Server 2008 kernel allocates memory in pools. These pools are known as the paged pool and the non-paged pool. Performance degradation and server instability can result if the memory for these pools is exhausted. To avoid this situation, you can enable auto-tuning at server startup by editing the PagedPoolSize registry value in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management registry subkey.   

Get this from; http://support.citrix.com/servlet/KbServlet/download/29413-102-705928/XA%20-%20Windows%202008%20R2%20Optimization%20Guide.pdf.pdf & http://www-01.ibm.com/support/docview.wss?uid=swg215912

October 9th, 2013 8:38pm

What OS version is the Server running?  I'm assuming 2008 R2.

What OS version are the connected clients running? 

If 2008 R2 and Windows XP, the connection to the spooler will be using Named Pipes over SMB over network transport.

if 2008 R2 and Windows 7, the connection to the spooler will be using Async RPC over network transport.

if 2003, then all client connections to the spooler will be using Named Pipes over SMB over network transport.

Do you have just ONE spooler resource hosted in the cluster?

Have your support person verify if the open printer calls are succeeding or failing (and thus are called again endlessly) .   

Free Windows Admin Tool Kit Click here and download it now
October 9th, 2013 9:04pm

DisablePagingExecutive is already set to 1.

The server is 2008 R2 SP1.

Clients are Windows 7 SP1. 98% are 32bit.

Yes, there is only one spooler resource in the cluster.

Yagmoth, I dont have a problem sending your a packet capture, but itll be huge.  I can set a limit of 100k packets and itll fill it up in less than a second.  MS and I recently did a netmon capture and he let it run for about 5 minutes.  It attempted to capture 8million+ packets. I had to let it process the packets over night, but it subsequently locked the server.  Is there someplace I can upload the packet capture to you?

Ill also check with the support guy about the open printer calls succeeding or failing.  Would a packet trace show that?

Thanks for all your guys' help!

October 9th, 2013 9:17pm

Alan-

  Im thinking about what you said about if the clients are Win7 then they should be using Async RCP over network transport.  This sparked something in my memory from past packet captures. So I just grabbed another capture and Im am seeing many packets over SMB2 with "STATUS_PIPE_NOT_AVAILABLE" and "Ioctl Request NAMED_PIPE Function0x0006".  Could something be making it use the wrong protocols?

-Rob

Free Windows Admin Tool Kit Click here and download it now
October 9th, 2013 9:42pm

Smb2 disabled on the server and not in the workstation, or such scenario maybe ? http://support.microsoft.com/kb/2696547/en-us
October 9th, 2013 9:55pm

The SMB1/2 values dont exist on the server so Im assuming the default of Enabled applies here.  Same goes for the workstation.

-Rob

Free Windows Admin Tool Kit Click here and download it now
October 9th, 2013 10:03pm

ok, well the error you told seem to tell that the smb are running out of available instance; http://msdn.microsoft.com/en-us/library/ee441884.aspx : All instances of the designated named pipe are busy.

Check that document; http://social.technet.microsoft.com/wiki/contents/articles/4494.windows-server-troubleshooting-the-rpc-server-is-unavailable.aspx#TCP_Session_Establishment It show some troubleshooting for the session establishment. (wireshark filter to confirm)

October 9th, 2013 10:16pm

Error 6 is Invalid Handle

C:\>winerror 6
     6 ERROR_INVALID_HANDLE <--> 0x80090301 No Symbolic Name
     6 ERROR_INVALID_HANDLE <--> 0xc0000008 STATUS_INVALID_HANDLE

The first thing you need to do is open Devices and Printers on the nodes and delete any printer connections that you see targeting the clustered spooler resource.

Anytime the thread count in the spooler goes above 550, I suspect a deadlock in win32spl.dll (this is the client side of the spooler) and establishes an RPC thread pool of 512 threads after which other threads are waiting for the next slice of the 512 pool.

We released an update for clustered spoolers for 2008 R2 RTM and these changes are included in SP1.

http://support.microsoft.com/kb/976571/en-us

Stability update for Windows Server 2008 R2 Failover Print Clusters

Free Windows Admin Tool Kit Click here and download it now
October 9th, 2013 10:35pm

As the server stands now there are 870 threads for the spoolsv.exe service on the server, though students here are starting to leave for the day.  We have seen this go up to 1200+.

The server is SP1 so those fixes should be included in there.  Also, we have deployed hotfixes KB2775511 and KB2977136 to the clients last week to update win32spl.dll, spoolsv.exe, etc (as listed on the KBs) to the latest versions available (as far as I know).

Now, when you say "The first thing you need to do is open Devices and Printers on the nodes and delete any printer connections that you see targeting the clustered spooler resource."  Are you saying that I need to delete all of the printers on all of the clients?

Lastly, in that short packet trace I took a bit ago (of 100,000 packets in 0.5 seconds, mind you), there was no RPC over TCP/IP traffic in it, as filtered by tcp.port==135 in wireshark.  Only SMB2, port 445 traffic regarding the named pipes, etc.  Not sure if this is correct or not.


October 9th, 2013 11:08pm

You can send your support person my way.  They probably know who I am already.
Free Windows Admin Tool Kit Click here and download it now
October 9th, 2013 11:10pm

Excellent.  Can I send you the case number or should I just request they send the case to you?

October 9th, 2013 11:13pm

I'm not in support, I do not have access to the tools they are using for support cases.  

No printer connections on clients need to be touched,  verify that the cluster nodes do not have a connection to any shares from the clustered spooler resource.  It's more a best practice.  I think there was a QFE for Server 2003 on this.

Filtering on 445 or 135 is probably not the best plan.  That will only capture the SMB traffic when you are more interested in Async RPC traffic and that will be a different TCP/IP endpoint each time the spooler is started.

Free Windows Admin Tool Kit Click here and download it now
October 9th, 2013 11:45pm

111
October 10th, 2013 1:57am

And I'd like to confirm that you did not attempt to disable Async RPC on the cluster nodes.

Are you runng the latest spooler components on the cluster nodes?

Free Windows Admin Tool Kit Click here and download it now
October 10th, 2013 1:59am

Hi

Please continue with Alan's diagnostic, but just a small thing, did you have on that server a dual NIC with balancing ? I tend to always configure the NIC's software to be in failover, in balancing the packet sometime don't use the correct route and thus the server got problem with it.

October 10th, 2013 6:30am

That is my case number.

No, Async RPC is not disabled on the nodes.  I did briefly try that on a few clients, however, but I removed that key shortly after.  I thought I read that key was not valid for 2008 R2 so I didnt try it.

The spooler components should be the latest:

spoolsv.exe - 6.1.7601.22149
winprint.dll - 6.1.7601.17514
win32spl.dll - 6.1.7601.22311

The only questionable one is spoolss.dll which is version 6.1.7600.16385.

Cluster nodes do not have a conn to any shares from the spooler resource.

Yagmoth, yes the cluster is set for failover.

Free Windows Admin Tool Kit Click here and download it now
October 10th, 2013 4:05pm

Work with the CSS support guy on this. He'll have some suggestions for you.

I'm expecting localspl.dll with version 7601.21687 or greater.

On my SP1 cluster spools.dll is 6.1.7600.16385.

You are not hitting the 512 threadpool limit in win32spl.

October 10th, 2013 8:37pm

Hi Alan-

  localspl.dll is currently 6.17601.22137 on the server and client.  Support suggested to install HotFix KB2526028, however the versions on our the servers and clients are already the same or newer that what is listed in the hotfix.

Server File                      KB2526028 Ver.        Current Ver. Splwow64.exe       6.1.7601.21687         6.1.7601.22268 Localspl.dll            6.1.7601.21687         6.1.7601.22137

Winprint.dll            6.1.7601.17514         6.1.7601.17514

Client Localspl.dll            6.1.7601.21687         6.1.7601.22137 Winprint.dll            6.1.7601.17514         6.1.7601.17514


Free Windows Admin Tool Kit Click here and download it now
October 10th, 2013 9:06pm

Glad he's already contacted you today.

I'm assuming the clustered spooler resource name is a simple 5 letter word starting with P.

I do not think the client version will really matter.

I don't suppose you renamed the spooler resource at one point.   

October 10th, 2013 9:23pm

Correct. And, no, I dont remember ever renaming the spooler resource when I was configuring it.

Free Windows Admin Tool Kit Click here and download it now
October 10th, 2013 9:34pm

Hi Alan-
  When you mention that the client version wont really matter, is there no known issues on the clients at the moment or do you think its strictly server related?

  I ask because I think this is a client issue for two reasons:

1.  The number of connections we are seeing the client opening with the server - along with the amount of network traffic going to the server.

2.  I can move ONE widely used print queue over to a different server and that one queue will cause that server to sit at 100% CPU during the day.  This queue is not heavily used, but it is installed on most workstations.

Im willing to bet that if I create a "fake" queue on a server and install it on all workstations that it will cause the server to be fully utilized even with no jobs being sent to it.

Support did come back with some last night, but Im not sure what the solution is that he is proposing.  He mentioned this as the possible problem:

* pFullPrinterName = 0x00000000`00000058 "--- memory read error at address 0x00000000`00000058
October 11th, 2013 12:40pm

I think it's server related due to the clustered spooler resource.  I've not seen any issue like the one you are reporting with the version of localspl.dll you have.

I told him this was a concern.  I'm not really sure that's the information he should be updating you with.  Did he ask about the shares the clients were calling?   If so do they exist? 

Are all your shares on a clustered spooler resource or are you having the same issue on a standalone machine?

Free Windows Admin Tool Kit Click here and download it now
October 11th, 2013 7:53pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics