This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

How to identify the end of a file transfer from server side?

0

The scenario is quite straight forward - a client is downloading a file from a server using HTTP GET.

I can use wireshark at server end, but not at client end.

What I want to do is to calculate the download throughput. I'm trying to use wireshark to see the beginning and end timestamp of the file download, but I'm having trouble identifying the latter. (The beginning time is when server receives the HTTP GET packet)

As far as I can see, there are lots of TCP ACK coming into the server after the HTTP 200 OK packet was sent from server to the client, which makes sense as the client is ack'ing the data segments.

My questions are:

  • How to identify the end of a file transfer? Is it the last ACK received from client?
  • If I want to do this in real time (using tcpdump, for example), I won't be able to know which ACK is the last one. So is there any way I can "track" the packets and know it's the last one, e.g., through seq/ack numbers?

asked 26 Jun '16, 03:25

Chang's gravatar image

Chang
16337
accept rate: 0%


One Answer:

1

First of all, if we talk about network throughput, the beginning of transfer is not when the server receives the "HTTP GET packet", and even not when the server receives the last packet of the HTTP GET (which may occupy more than a single packet in some cases). The beginning of the transfer is when the server sends the first packet with non-zero payload size.

The end of the transfer is when all parts of the file have reached the client. If there is no packet loss, it is the moment when the last packet with non-zero payload size has arrived to the client. And "last" is the one which is followed by at least one packet from the server which has zero payload length, which may be a FIN, a RST, or simply an ACK to the first packet of a subsequent GET sent using the same TCP session.

The trouble (from the perspective of your task) is that a typical browser does not close a TCP session immediately after finishing transfer of a single file but keeps it open for a while and eventually reuses it if it needs to transfer another file from the same server. So if you open a web page which e.g. contains several pictures stored at the same server like the base html file, you'll see several GETs and responses to them in a single TCP session.

What might help you is that Wireshark, and therefore also tshark, normally reassembles the payload, so the last packet of a file is the one which is marked as HTTP, while all the previous ones are only marked as TCP. This won't work with tcpdump which does not reassemble the application protocols. But using this feature for your online throughput analysis requires that you take the HTTP GET as the beginning of the file transfer which induces some error into your bandwidth calculation (the request processing time at the server.)

answered 26 Jun '16, 04:23

sindy's gravatar image

sindy
6.0k4851
accept rate: 24%

edited 26 Jun '16, 04:27

Thanks a lot for your comment!

Can you please elaborate a bit more on (last paragraph) "Wireshark ... the last packet of a file is the one which is marked as HTTP"?

When I look at the wireshark trace, I can see that there is always a "HTTP 200 OK" near the end of a download - is this the one you are referring to?

(26 Jun '16, 12:19) Chang

Btw I forgot to mention that I'm looking to estimate the throughput by best effort, so the error caused by using HTTP GET as the beginning of a transfer is acceptable in this case.

(26 Jun '16, 12:21) Chang

I can see that there is always a "HTTP 200 OK" near the end of a download - is this the one you are referring to?

Yes, exactly. An HTTP PDU, especially one carrying a file as a payload, often spans over several packets (sometimes thousands of packets), and thus Wireshark (as well as the actual recipient) can only properly process it after it gets received completely. So although the string "200 OK" is physically present in the first packet of the response, tshark shows it in the "reassembled data" of the last one, and therefore also marks only the last packet of the response as a HTTP one.

The file you transfer is actually a payload of the HTTP 200 OK message, i.e. there is some overhead added to the file size.

There are also several methods of encoding the file contents for transfer, so the number of bytes needed to transfer the file may differ significantly from its actual size. It may be bigger but also smaller as some methods encode binary data using only byte values which represent printable characters (so the result occupies more bytes), some methods compress the data and use all 8 bits to transport the result.

So your throughput measurement will give you the raw figures, i.e. actually transferred bytes per second including all overheads (HTTP, TCP, IP, Ethernet) plus the size after encoding of the transferred file.

Off topic: on this site, the purpose of the "thumbs up" icon next to an Answer is to allow other users than the author of the Question to vote for those Answers which they consider more useful than other ones. To mark an Answer as useful, the author of the Question should use the checkmark icon. Doing so changes the appearance of the Question in the list in order to indicate to others coming to ask a similar Question that the existing one has been answered usefully.

(26 Jun '16, 14:07) sindy

This has been super helpful, thanks again!

(26 Jun '16, 15:18) Chang