bdp = throughput

Question

Hola guys! Window size -if I understood TCP uses it for flow control.

BDP = throughput ?

Please correct if I'm wrrong

Accepted Answer

2

The bandwidth delay product (BDP) is an indication of the needed window size (WS) to let 1 TCP stream utilize the complete bandwidth of the connection.

If you turn the calculation around, one TCP session can only send one window size of payload per round trip, as it needs an ack to get confirmation that more data can be sent. If you add the protocol overhead of all the other layers you get the theoretical maximum bandwidth per tcp session. For an ethernet network, you get:

BW = WS*8 * (1000/RTT) * 1518/1460 / 1000000
BW = Bandwidth in megabits per second
WS = Window size in bytes
RTT = Roundtrip time in milliseconds
1518 = frame length on the wire in bytes
1460 = TCP payload size in bytes, so assuming no extra IP or TCP options

So for a 64K WS on a 1 Gbit/s connection with 10ms RTT, you get a maximum of ~54,5 Mbit/s for one TCP connection. If you want that one TCP connection to fully utilize the full 1 Gbit/s bandwidth, you can calculate the other way around:

  WS = BW*1000000/8 * (RTT/1000) * (1460/1518)

That’s where you can see that you need to create a product of the BW and the delay (RTT). Any WS higher than the calculated value will make sure the connection is not stalled as a result of the RTT of the connection.

answered 14 Mar ‘15, 03:32

SYN-bit ♦♦
17.1k●9●57●245
accept rate: 20%

thank you, this made it easier for me to understand.

but in practice the WS will be probably always lower than the calculated BDP … ?

so, if let’s say we have an backup running over a Gigabit Ethernet network with large RTT, and the actual Window Size advertised is way lower then the calculated BDP this could be one of the reasons why the transfer rate is low (at least one of the reasons ?). This because we would send less data and have to acknowledge it more often + considering the RTT this would cause the rate go down yes ?

(14 Mar ‘15, 03:56) adasko

1

(please use comments instead of new answers, that’s how this site works best, se the FAQ for more details)

In practice the WS is usually smaller than the BDP if it is a line with high RTT and high BW. This is no problem as more than one client use more than one TCP stream and so combined they can consume the BW of the link.

In case you need to utilize the full BW of the link by one system using one TCP stream (like your backup scenario), you need to tune the WS of the receiving end to make sure the sender can keep sending data until the first part of the data is acknowledged (after one RTT).

So there are not more acknowledgements, it just takes longer for them to reach back to the server, so the WS needs to be big enough that the sender can keep sending while the acks are on the way back.

And yes, this is a very common problem when doing backups or data-replication over WAN links.

(14 Mar ‘15, 04:03) SYN-bit ♦♦

thank you again, sorry for using the “answer” option.

so if the receiving window is not big enough isn’t there a reason why the receiving host advertises it (instead of a larger one):

is it possible that the receiving host is acknowledging data faster thn it can process it to the upper layer out of the buffer ? ( i don’t know maybe the application is not writing the data as fast enough to the disk, tape etc.)
might it be the reasons the the receiving host is “busy” in terms of CPU /RAM usage ?
might it be a badly written application ?

so should this thinks have to be taken in to count first before tuning the Window Size of the receiving host (if yes, there are probably many more reasons).

and what is your experience while running backups within a LAN. i think there shouldn’t a high value of RTT at all neither packet lost. is it possible for a LAN topology that the WS is still way lower thn the calculated BDP ?

i know it’s a lot but for now i found your answers way more clear than most of the articles out there :)

(14 Mar ‘15, 04:17) adasko

1

If you are troubleshooting a performance issue, then yes, you need to take all factors into account. But it is good to work at them one by one.

Make a tracefile (at the sending side) and look for large chunks of data that are interrupted by a couple of ms delay, then an ACK comes in and the sender starts sending again. You can also see this as a “staircase” in the tcp streamgraph. Also notice the “bytes-in-flight” counter goes up to (almost) the WS of the receiver. Then you need to increase the WS as the RTT of the network is the problem.
If the application can not read data fast enough from the TCP buffers, you will see the WS go down slowly and if the condition continues, it will get lower than 1 MSS which will prevent the sender from sending a full sized frame. Usually the sender will then pause until the WS goes up or after a timer expires, it will send a smaller frame that will fill up the receive buffer of the receiver (the delay is used to prevent the silly window syndrome).

So, since this is a wireshark Q&A site, make a trace (at the sending side at least, but preferably on both sides, also for the learning experience that things look really different on both ends of the connection). With above information you should be able to determine if you are experiencing delays due to RTT/WS issues or server performance issues.

(14 Mar ‘15, 04:32) SYN-bit ♦♦

“bytes-in flight” so this will refer to the data sent but not acknowledge yet , yes ?

increasing the WS - so this would have to be done probably in registry on the receiving host ? i also came across something called auto-tuning if T’m not wrong but as far i understood it causes more issues - this i will have to read later on. but just for my understanding. first i would have to calculate the optimum Window Size, set it on the receiving host and no matter what this will be always the senders send windows and the receivers, receive window . yes ?

thank you again. you are a real champ. hope to have the same knowledge as you one day :(

(14 Mar ‘15, 04:43) adasko

1

Yes, BiF is referring the data that is not acknowledged yet.

Usually you configure a maximum window size and the system will start with a smaller value and then when needed increase it’s size up to the configured maximum. Autotuning should do this automagically. Please note that this is just local for a system, so the senders buffers and the receievers buffers are independent of one-another.

You will learn, bit by bit, just like I did :-)

(14 Mar ‘15, 05:05) SYN-bit ♦♦

showing 5 of 6 show 1 more comments