This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Weird stall in TCP on PS3

0

The PS3 tcp stack is a particularly fragile piece of software and it often drops incoming packets, resulting in stalled connections. However, here I have a very strange TCP phenomenon that I cannot quite understand. This happens fairly frequently, I have a script that can make it happen at will. I am making a http request for 1000000 bytes to a server some 40ms RTT away. The funny thing is the server sending a bunch of duplicate acks and then pausing before retransmitting. I see this very regularly. What is going on? See The funny thing is that after receiving some data, and acking it all (some 31k) the server then pauses, and after some 10ms proceeds to send http://www.cloudshark.org/captures/d46dcdc803c1?filter=tcp.stream%20eq%2012 , the problem starts with packet 456.

Stream 12 recovers, but others do not, e.g. stream 19, where we time out. in stream 19, the retransmit is acked, and yet, the server just stops sending data.

The only thing I can imagine going on is for all the ack packets from the PS3 to the server being lost somehow upstream. But I don't see this problem occurring for a PC machine doing the same thing. The only difference I can think of is the ps3's relatively small rcv window size of 64k.

asked 29 Jun '13, 05:19

KristjanValur's gravatar image

KristjanValur
11113
accept rate: 0%


One Answer:

2

It looks like the server is having a problem with the TSval of 0. It starts sending packets with a TSecr of 4, which it shouldn't (as it never received a tsval of 4). Since it has 4 in it's TCP control block, all the ACK's from the client are rejected (as they have a tsval of 0 which is lower than the 4 in it's control block). Therefor it considers the data it has sent as un-acknowledged and after the (3 second) retransmission timer goes off, it starts retransmitting. Now things go well, because by that time the tsval has increased to 12.

The question is whether a tsval of 0 is allowed. RFC 1323 is not clear on this. It only states that TSval should be 0 when the field is not supposed to be valid, so one might consider it an invalid value.

answered 29 Jun '13, 06:35

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245
accept rate: 20%

Interesting! It looks as though I'll have to create a defect with Sony to fix their stuff. It also seems to be server related so I need to figure out what the Cdn provider is running and check if it can be upgraded . Thank you.

(29 Jun '13, 07:26) KristjanValur

Well, I'm not sure if it is faulty behavior per se, but the server does not seem to cope with it very well. It wouldn't hurt to bring it to their attention though :-)

(29 Jun '13, 13:48) SYN-bit ♦♦

On looking more closely, I don't think the PS3 is at fault. It simply has a slow running clock and 0 should be a valid value. See stream 19 as an example. Soon, the ps3 is sending a a TSval of 2. However, the server's data segemnts all contain TSecr of 18 which has, again, never been seen. This time round when the retransmitt occurs, things don't correct themselves, because the PS3's ACK clock has only advanced to 12. This definitely looks like a faulty server side TCP stack to me. Thanks again for bringing my attention to the timestamp options, threre seem to be endless nuances to the TCP protocol.

(01 Jul '13, 02:37) KristjanValur

Btw, I don't think that the RFC specifies that incoming segments with unexpected TSval values should be dropped, as seems to be the case here. the RTTE should be a wholly separate thing, run as an add on to regular stream control flow and not affect packet processing in other ways, right?

(01 Jul '13, 02:44) KristjanValur

OK, I did not look at the other sessions. But I think you're right that the server has a faulty TCP implementation, as it is echo'ing timestamps it has not seen. You might also want to check any intermediate device if you see this behavior with other sites too, they might be altering the timestamps (although I have not seen any device do that myself).

The RFC 1323 does say the following in par 4.2:

  "The basic idea
  is that a segment can be discarded as an old duplicate if it is
  received with a timestamp SEG.TSval less than some timestamp
  recently received on this connection."

So I do think the server drops the ACK packets, this seems to be backed up by the fact that is starts retransmitting from the start of the session.

(01 Jul '13, 03:23) SYN-bit ♦♦

Hey, when testing this from a PC, I found that the latest windows clients don't use tcp timestamps for connections they initiate. This can be changed with a netsh option, however. I'm curious, why is this? RFC1323 introduces TS as a means of RTTE, but the wikipedia article on TCP mentions its primary use being PAWS. Surely both are important? Any idea why windows isn't using them by default?

(02 Jul '13, 07:02) KristjanValur
showing 5 of 6 show 1 more comments