High Print Server Utilization and having troubles with Microsoft Support
We have a print cluster that hosts approx. 180 printers for about 2200 client workstations. In order to make printing useful I had to assign these servers 8-cores each, for a total of 24Ghz on each server. On the server that has the print service active,
it will sit at 100% utilization during on-hours. Even during the late night/early morning period when there are no users in the distirct it will still consume 7Ghz+ of CPU constantly. The print frequency on the servers is low. If Id have
to estimate its around 10-15 jobs / minute; nothing huge either. In order to make printing somewhat useful I had to assign these servers 8-cores each. Users are experiencing slow printing, having to print larger jobs during off hours and its causing
our VM environment to become completely unbalanced.
I have a case open with Microsoft on this. I can provide the case number if needed. However after weeks of them debugging they are saying this type of utilization is normal. There is no way that this is normal. Especially since last
year we had about 100 printers on a single server, with other roles installed on it as well, and it never went above 25% on a single core. Ive researched server scalability and we shouldnt even be close to the amount of resources these servers are using.
If anybody could help me out with this Id greatly appreciate it. I know that there are some internal Microsoft guys around here that might be able to throw some ideas. Really anything at this point would be helpful.
Thanks a bunch-
September 26th, 2013 5:33pm
If you have any WSD Ports installed on the cluster get rid of them. WSD is the first path when creating a network port to the device but the underlying transport is not the best in cluster environments. Most new printers support WSD so if one
does not specifically select TCP/IP Device from the drop down list you can wind up with a WSD port.
One other issue is with older version of HP's Universal print drivers. HP did address the problem but I know they have been fine tuning how to have better performance with the driver in a clustered spooler environment.
WSD ports can only be deleted when deleting a printer. I typically create a Standard TCP/IP Port to the device on the ports tab of the shared printer thus disassociating the WSD port from the share. Then I add a fake printer using the WSD Port.
Then I delete the fake printer which also removes the WSD Port.
September 27th, 2013 9:00pm
Thanks for the suggestions. We dont have any WDS ports and we are only a revision or so back from the latest HP UPD. The majority of our printers are Kyoceras, which we are on the latest. We also have Konica Minolta copiers which
are on the latest as well.
I have pointed this out in the beginning of the case, but the client traffic seems to be the problem. On 180 printers with approx 2200 and low print utilization the server still sees about 300-400 Mbps constant throughout the day.
After I posted this message MS support came back saying that the spooler service has about 1200 threads open and the issue is on the clients - many print open/print close packets. Even with a single printer we can see that a client machine may
have 10-15 connections open to the print service on the server. We are running some procmon and xperf data gathering on the clients to attempt to determine what is causing all the client requests to be initiated.
Id appreciate any more help you could provide.
September 30th, 2013 9:36pm
Can you isolate wich computer open the most ports (like the top 10), and after isolate what printer it got installed for those user. I'am suspecting a bad driver that keep the port open or badly communicate, but for over 180 printers it might be hell to
October 1st, 2013 6:58am
I have tried to narrow that down already. I tried installing only one printer at a time of each of the three vendors we use. Each one resulted in the basically the same amount of connections - roughly 6 (+/-1).
Now there are systems that I have seen with 15+ open connections. Many systems have 5+ printers installed on them, but the connections do not seem to be completely cumulative. I.e. Five printers will not result in 30 connections.
Also, regarding the traffic sent to the print server. I have done packet captures from the client to the server and the clients are sending approx 100 packets per seconds to the print server, even while sitting idle and with no jobs being sent. The
traffic appears to be those printer open/printer close requests that MS support has suggested. I think this along with the open connections is the root cause, but Im unsure of how to pinpoint it further to what is causing it.
October 1st, 2013 4:38pm
Id appreciate any other suggestions anybody may have. The case has been taken over by a different engineer out of the blue and I feel this case is going backwards and he is repeating items that were already previously.
October 7th, 2013 4:02pm
What port are opened when you monitored ? Can we see a filtered wireshark ?
Edited; please check that too;
Paging of the Executive
HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management DisablePagingExecutive=dword:00000001
In order to increase performance, kernel mode drivers and other system components can be configured to that they are not paged to disk. However, there must be enough memory available to hold these items or else the system will experience performance and
and The Windows Server 2008 kernel allocates memory in pools. These pools are known as the paged pool and the non-paged pool. Performance degradation and server instability can result if the memory for these pools is exhausted. To avoid this situation, you
can enable auto-tuning at server startup by editing the PagedPoolSize registry value in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management registry subkey.
Get this from;
October 9th, 2013 8:38pm
What OS version is the Server running? I'm assuming 2008 R2.
What OS version are the connected clients running?
If 2008 R2 and Windows XP, the connection to the spooler will be using Named Pipes over SMB over network transport.
if 2008 R2 and Windows 7, the connection to the spooler will be using Async RPC over network transport.
if 2003, then all client connections to the spooler will be using Named Pipes over SMB over network transport.
Do you have just ONE spooler resource hosted in the cluster?
Have your support person verify if the open printer calls are succeeding or failing (and thus are called again endlessly) .
October 9th, 2013 9:04pm
DisablePagingExecutive is already set to 1.
The server is 2008 R2 SP1.
Clients are Windows 7 SP1. 98% are 32bit.
Yes, there is only one spooler resource in the cluster.
Yagmoth, I dont have a problem sending your a packet capture, but itll be huge. I can set a limit of 100k packets and itll fill it up in less than a second. MS and I recently
did a netmon capture and he let it run for about 5 minutes. It attempted to capture 8million+ packets. I had to let it process the packets over night, but it subsequently locked the server. Is there someplace I can upload the packet capture to
Ill also check with the support guy about the open printer calls succeeding or failing. Would a packet trace show that?
Thanks for all your guys' help!
October 9th, 2013 9:17pm
Im thinking about what you said about if the clients are Win7 then they should be using Async RCP over network transport. This sparked something in my memory from past packet captures. So I just grabbed another capture and Im am seeing many
packets over SMB2 with "STATUS_PIPE_NOT_AVAILABLE" and "Ioctl Request NAMED_PIPE Function0x0006". Could something be making it use the wrong protocols?
October 9th, 2013 9:42pm
The SMB1/2 values dont exist on the server so Im assuming the default of Enabled applies here. Same goes for the workstation.
October 9th, 2013 10:03pm
Error 6 is Invalid Handle
6 ERROR_INVALID_HANDLE <--> 0x80090301 No Symbolic Name
6 ERROR_INVALID_HANDLE <--> 0xc0000008 STATUS_INVALID_HANDLE
The first thing you need to do is open Devices and Printers on the nodes and delete any printer connections that you see targeting the clustered spooler resource.
Anytime the thread count in the spooler goes above 550, I suspect a deadlock in win32spl.dll (this is the client side of the spooler) and establishes an RPC thread pool of 512 threads after which other threads are waiting for the next slice of the 512 pool.
We released an update for clustered spoolers for 2008 R2 RTM and these changes are included in SP1.
Stability update for Windows Server 2008 R2 Failover Print Clusters
October 9th, 2013 10:35pm
As the server stands now there are 870 threads for the spoolsv.exe service on the server, though students here are starting to leave for the day. We have seen this go up to 1200+.
The server is SP1 so those fixes should be included in there. Also, we have deployed hotfixes KB2775511 and KB2977136 to the clients last week to update win32spl.dll, spoolsv.exe, etc (as listed on the KBs) to the latest versions available (as far
as I know).
Now, when you say "The first thing you need to do is open Devices and Printers on the nodes and delete
any printer connections that you see targeting the clustered spooler resource."
Are you saying that I need to delete all of the printers on all of the clients?
Lastly, in that short packet trace I took a bit ago (of 100,000 packets in 0.5 seconds, mind you), there was no RPC over TCP/IP traffic in it, as filtered by tcp.port==135 in wireshark.
Only SMB2, port 445 traffic regarding the named pipes, etc. Not sure if this is correct or not.
October 9th, 2013 11:08pm
You can send your support person my way. They probably know who I am already.
October 9th, 2013 11:10pm
Excellent. Can I send you the case number or should I just request they send the case to you?
October 9th, 2013 11:13pm
I'm not in support, I do not have access to the tools they are using for support cases.
No printer connections on clients need to be touched, verify that the cluster nodes do not have a connection to any shares from the clustered spooler resource. It's more a best practice. I think there was a QFE for Server 2003 on this.
Filtering on 445 or 135 is probably not the best plan. That will only capture the SMB traffic when you are more interested in Async RPC traffic and that will be a different TCP/IP endpoint each time the spooler is started.
October 9th, 2013 11:45pm
October 10th, 2013 1:57am
And I'd like to confirm that you did not attempt to disable Async RPC on the cluster nodes.
Are you runng the latest spooler components on the cluster nodes?
October 10th, 2013 1:59am
Please continue with Alan's diagnostic, but just a small thing, did you have on that server a dual NIC with balancing ? I tend to always configure the NIC's software to be in failover, in balancing the packet sometime don't use the correct route and thus
the server got problem with it.
October 10th, 2013 6:30am
That is my case number.
No, Async RPC is not disabled on the nodes. I did briefly try that on a few clients, however, but I removed that key shortly after. I thought I read that key was not valid for 2008 R2 so I didnt try it.
The spooler components should be the latest:
spoolsv.exe - 6.1.7601.22149
winprint.dll - 6.1.7601.17514
win32spl.dll - 6.1.7601.22311
The only questionable one is spoolss.dll which is version 6.1.7600.16385.
Cluster nodes do not have a conn to any shares from the spooler resource.
Yagmoth, yes the cluster is set for failover.
October 10th, 2013 4:05pm
Work with the CSS support guy on this. He'll have some suggestions for you.
I'm expecting localspl.dll with version 7601.21687 or greater.
On my SP1 cluster spools.dll is 6.1.7600.16385.
You are not hitting the 512 threadpool limit in win32spl.
October 10th, 2013 8:37pm
localspl.dll is currently 6.17601.22137 on the server and client. Support suggested to install HotFix KB2526028, however the versions on our the servers and clients are already the same or newer that what is listed in the hotfix.
File KB2526028 Ver. Current Ver.
Splwow64.exe 6.1.7601.21687 6.1.7601.22268
Localspl.dll 6.1.7601.21687 6.1.7601.22137
Winprint.dll 6.1.7601.17514 6.1.7601.17514
Localspl.dll 6.1.7601.21687 6.1.7601.22137
Winprint.dll 6.1.7601.17514 6.1.7601.17514
October 10th, 2013 9:06pm
Glad he's already contacted you today.
I'm assuming the clustered spooler resource name is a simple 5 letter word starting with P.
I do not think the client version will really matter.
I don't suppose you renamed the spooler resource at one point.
October 10th, 2013 9:23pm
Correct. And, no, I dont remember ever renaming the spooler resource when I was configuring it.
October 10th, 2013 9:34pm
When you mention that the client version wont really matter, is there no known issues on the clients at the moment or do you think its strictly server related?
I ask because I think this is a client issue for two reasons:
1. The number of connections we are seeing the client opening with the server - along with the amount of network traffic going to the server.
2. I can move ONE widely used print queue over to a different server and that one queue will cause that server to sit at 100% CPU during the day. This queue is not heavily used, but it is installed on most workstations.
Im willing to bet that if I create a "fake" queue on a server and install it on all workstations that it will cause the server to be fully utilized even with no jobs being sent to it.
Support did come back with some last night, but Im not sure what the solution is that he is proposing. He mentioned this as the possible problem:
* pFullPrinterName = 0x00000000`00000058 "--- memory read error at address 0x00000000`00000058
October 11th, 2013 12:40pm
I think it's server related due to the clustered spooler resource. I've not seen any issue like the one you are reporting with the version of localspl.dll you have.
I told him this was a concern. I'm not really sure that's the information he should be updating you with. Did he ask about the shares the clients were calling? If so do they exist?
Are all your shares on a clustered spooler resource or are you having the same issue on a standalone machine?
October 11th, 2013 7:53pm