Hi all !!
So I have the following scenario. A print server in Sydney Australia and Ricoh MP 5054 printers in a number of sites in Singapore. At one site, for all four printers, users were complaining that a large print job ( 26M ) was taking up to 3 mins to be sent from Sydney to start printing. The same model printer at other sites took in the order of 20-30 seconds.
Latency is about 100ms between Sydney and SG. I have a NetShark in Sydney and so ran captures to all 5054 printers. According to both WireShark and Riverbed Packet Analyser, all four printers at the slow site were advertising a maximum TCP window size of 16KB. The other sites showed TCP RWIN of 196KB, thus explaining the dramatic difference in job transfer times. These values are completely different to the Printer IO buffer settings on the device itself which seem to bear no resemblance to what I am actually seeing on the wire ;-)
Before I go asking the remote team to start shifting a huge printer from one site to the other to prove my point, and making me look stupid if it doesnt fix the problem, 1 basic question... Can anything in the network change the advertised TCP window size that I am seeing from the printer in the captures ?
I capture the 3 way handshake, so scaling is not the issue, WS on the printers is 1. Or is this really a bad coincidence and case of a whole bad batch of printers at this site ? All firmware and settings are identical across all sites.
The TCP Receive Window size in a packet capture, provided scaling is catered for, doesnt lie right ;-)
asked 08 Oct '17, 14:17
edited 09 Oct '17, 03:27
I agree that the TCP window size seems to be the problem here. Do you have captures close to the printers, or is all you have on the print server site? That would be required to prove that some device in the path messes up the receive window size. Or, to answer your question: yes, there may be devices that change the window size, e.g. load balancers, traffic shapers and other black boxes.
My recommendation would be to try to get a capture on the printer side, as close as possible to the printer itself (e.g. TAP on the cable to the printer, or SPAN on the same switch) to see if the device itself sends such a low window size.
answered 08 Oct '17, 15:14
So I discovered the answer ;-) After performing packet captures on a mirrored port of a printer at the 'fast' site I saw something interesting in the delta between the SYN/ACK from the printer and the ACK from the 'Print Server'. The delta was 0.07ms !! With a RTT between Sydney and Singapore of 150ms thats impossible right ?! ;-) So something on the LAN at the fast site was answering on behalf of the print server. That something happened to be WANx !! I thought of WANx initially but was lookng at things the wrong way round. I elimiated WANx from the equation because the slow site didnt have WANx ( I know my bad ). ALL the printers in fact have a TCP Receive window of 16KB ONLY, but WANx was effectively masking/fixing this inherent problem. Short of rolling out WANx to every single site just to fix print times - which is not cost effective for me - I have to go back to the vendor to persuade them to fix their issue. Looks like these printers are designed for local printing only and are extremely inefficient over high speed high latency links.
answered 12 Oct '17, 20:58