TCP Sucks
Nick Weaver gives me a shout-out.
there have been a couple of cases where the application vendors concluded they were causing too much damage and therefore started making changes. BitTorrent is the classic example. It is shifting to delay-based congestion control specifically to: (a) be friendlier to TCP because most of the data carried by BitTorrent really is lower-priority stuff; and (b) mitigate the “you can’t run BitTorrent and Warcraft at the same time” problem. So, there’s some hope.
It’s true. We occasionally take a break from drinking moonshine and shooting beer bottles to do real engineering.
Of course, I’ve always used TCP using exactly the API it provides, and even before I understood how TCP worked under the hood gone through great pains to use the minimum number of TCP connections used to the number which will reliably saturate the net connection and provide good piece diffusion. If TCP doesn’t handle that well, it isn’t my fault.
Now the intelligentsia have a plan, called RED to how to fix the internet, because uTP (using the LEDBAT congestion control algorithm), coming from the likes of me, can’t be the real solution. (By the way, I’d like to thank Stanislav Shulanov for being the real brains behind uTP.) I don’t believe this is a good idea, for several reasons.
First is it’s just plain unproven. It’s been years since RED was proposed, and to date noone’s come up with something where they can say ‘go ahead and deploy this, it’s mature’ with a straight face. Given that very smart people have worked on this, it stands to reason that the problems are just plain hard.
Second is it ain’t gonna happen. Deploying RED involves upgrading routers. To rely on it requires upgrading the entire infrastructure of the internet. The marketing plan is that because router vendors are unwilling to say ‘has less memory!’ as a marketing tactic, maybe they’d be willing to say ‘drops more packets!’ instead. That seems implausible.
Finally, RED is in an apples-to-apples sense a much cruder technique than uTP. With classic internet routing, a router will either pass along a packet immediately if it can, or add it to a queue to send later if it can’t. If the queue has become full, it drops the packet. With RED the router will instead have some probability of dropping the packet based on the size of the queue, going up to 100% if the queue is full. (Yes I know there are other schemes where packets already on the queue are dropped, I’m going to view all those things as variants on the same basic principle.) Since TCP only uses dropped packets as a signal to back off, this uses early packet dropping as a way of giving some information to TCP stacks that they need to back off before the queue gets full. The only information in use here is the size of the queue and the size of the buffer, with the size of the buffer becoming increasingly irrelevant due to buffer bloat, making its value be essentially ‘far too big’. RED makes dropped packets convey a little more meaning by having statistical gradations instead of a full/not full signal. uTP by contrast uses one way delays to get information for when to back off, which allows it to get very precise information about the size of the queue with every packet, with no packet loss happening under normal circumstances. That’s simply more information. You could in fact implement a ‘pretend the router’s using RED’ algorithm for uTP, with no router upgrades necessary.
Given that uTP can be implemented today, by any application, with no upgrades to any internet hardware being necessary, and that it solves that whole bufferbloat/latency problem, I think we should view the end to end approach as the solution to bufferbloat and just forget about changing router behavior.
We’ve already rolled out uTP as the default transfer algorithm for BitTorrent, which has changed the behavior of the internet so much that it’s changed how and whether ISPs need to upgrade their infrastructure.
‘But game theory!’ the naysayers will say ‘Game theory says that TCP will always win!’ Narrowly speaking this is true. TCP as a big giant SUV is very good at playing chicken. Whenever TCP goes up against a congestion control algorithm which makes an actual attempt to not have a completely full buffer, TCP will always fill the buffer and crowd out the other one. Of course, it will then stick the end user with all the latency of a completely full buffer, regardless of how bloated the buffer is, sometimes going into the seconds. For the end user to complain about how big the buffer is would be like them complaining to their credit card company for offering too high of a limit. ‘You should have known I’d spend too much!’ The solution is for the end user to intervene, and tell all their applications to not be such pigs, and use uTP instead of TCP. Then they’ll have the same transfer rates they started out with, plus have low latency when browsing the web and teleconferencing, and not screw up their ISP when they’re doing bulk data transfers. Even within a regime of everyone using uTP, it’s possible to have bulk transfers take lower priority than more important data by giving them a lower target delay, say 50 milliseconds instead of 100 milliseconds.
If you really want to design routers to give more information to end nodes, the big problem for them to fix is the one where everyone is attempting to do congestion control based on one way delays, but noone can get an accurate base delay because the queue is always full, so one way delays are always exactly the same and it looks like there’s no queue but high non-congestive packet loss. The best way to solve that is for a router to notice when the queue has too much data in it for too long, and respond by summarily dropping all data in the queue. That will allow the next bunch of packets let through to establish accurate minimum one way delays, and everything will get fixed. Of course, I’ve never seen that proposed anywhere…