Update: Microsoft has sent out a clarification that the only thing they’ve centralized is handshake negotiation, which is a good thing, but doesn’t include creating any of the features I’ve listed below. Consider this article to be about how you could do things better.
Skype (aka Microsoft) decided to use proxying for all connections not have their peer directory information be run on untrusted peers citing ‘security’. One might wonder how and if this improves security. The most immediate security benefit is that it keeps peers from being given the entire peer directory.accepting direct connections in from untrusted peers, which can possibly send exploits. Applications written in real languages don’t have such problems, but anything which does audio decoding is going to have to have a significant C component. On the flip side, compressed audio isn’t sanitized or reencoded by proxies, so exploits are still possible, but then again the server can easily add checks for new exploits and sanitize them and use them to develop an MO for spammers. Of course, the central server by having the ability to view everything can do a much better job of stopping non-security-related spam as well. Big brother protects you from spammers. It’s a fact of life.
The deeper advantage of going through a central proxy is that it avoids giving away IP addresses. If an attacker who runs a countrywide firewall wants to know who its dissidents are, it can place a call to a journalist known to be talking to dissidents to find out their IP address, then record all IP addresses within the country forming direct connections to the repeater’s IP address, and have a very short list of potential dissidents. While this attack is extremely crude, it’s about what most countries are able to pull off given their level of sophistication and huge amount of traffic they have to deal with, and idiot journalists have gotten people killed this way. Thanks, assholes. By proxying everything through their servers, Microsoft has basically set up a big single-hop anonymizer, which at least stops the very crude attack we’re worried about right now.
There are potential engineering advantages to going through proxies as well, although it’s doubtful that Microsoft is doing them yet. The engineering problem which completely dominates telephony is latency. The speed of light in a fiber optic cable around the earth’s circumference is about 200 milliseconds. By coincidence the generally accepted maximum acceptable round trip time for an audio call is the not much larger value of 330 milliseconds, which makes engineering a system which can handle phone calls from Argentina to Japan inherently difficult.
Despite the behavior of the telco industry, a 330ms round trip time is not a gold standard, it’s the absolute maximum above which people find it completely unacceptable. The telephony industry continues to pretend that it’s a gold standard anyway, letting call latencies get higher and higher with not a single company engineering a low-latency solution and bragging about it in their marketing materials, displaying an appalling lack of pride in one’s work, engineering prowess, and marketing acumen.
In any case, for the sake of argument I’m going to say that 330 ms is an ‘acceptable’ round trip time and 150 ms is a ‘good’ round trip time. Yes I know that for gaming anything above 100ms starts to suck, but people are more forgiving of audio calls.
Whenever you send data over the internet, the latencies add up as follows:
In the case of audio, data is always sent as soon as it’s generated. Reducing the bitrate being an extreme measure which is only done rarely, and then it’s to another basically fixed bitrate, so there isn’t any of the sort of dynamic congestion control which TCP has, and no point in waiting to send anything.
Upload queueing delay is hopefully usually zero. Unfortunately many routers have queues whose length can be measured in seconds, which TCP will happily completely fill continuously. Obviously if there’s any serious TCP-based cross traffic your audio calls are basically hosed, so we’ll just assume that there isn’t any. What size upload queues should be is a controversial topic which I’ll have a lot to say about at another time, for now I’ll just say that getting them below amounts which can potentially mess up audio calls is completely unworkable at current broadband connection rates, and anything over 500ms is completely broken.
Upload serialization delay is typically just a few milliseconds and not a major issue.
Propagation delay is dominated by the aforementioned speed of light problem. As a reasonable example, the one way latency between San Francisco and Amsterdam can’t be below 40ms just because of the laws of physics, and if you can get 80ms from one home net connection in the US to another one in Europe you’re doing very well. (If anyone can point out maps showing latencies between different parts of the internet which don’t suck I’d appreciate it. Most of the easy to find ones are focused on latencies between different ISPs in the same location, although some of the latencies they report make me want to cry.)
Download serialization delay is generally just a few milliseconds and not a major issue.
Download queueing delay generally can’t be caused by uploads from a peer net connection, because consumer download rates are much higher than upload rates, but if the downlink is saturated by TCP-based cross traffic you’re hosed, for the same reasons as with upload queueing delay.
That covers the basics of how the internet works from the standpoint of a layer 5 protocol developer. My apologies to people who work at the lower layers.
Where this becomes interesting for internet telephony is when it interacts with packet loss. On well behaved net connections the only source of packet loss should be packets getting dropped because queues are full, at which point phone calls are already hosed because of latency. Unfortunately lots of people have cross traffic on their wireless, or move their laptop too far from their base station, or have a shitty ISP, and have a lot of non-congestive packet loss. For the sake of argument, I’ll assume that this is a problem worth solving.
When data is lost, the only things you can do are to either have a skip, or send a rerequest. If you have a policy of sending rerequests, then you have to delay all traffic by the worst delay you can incur with a rerequest, because dynamically changing the latency produces a sucky end user experience. Let’s say that we have a simple direct connection which looks like this:
Alice ---- 80ms ---- Bob
In this case the direct round trip time will be 160ms, which is good. The speed of sound is about one foot per millisecond, so this will be about equivalent to yelling at someone 80 feet away from you. Unfortunately if you add in rerequests, you have to wait for a packet to (not) get there, a rerequest to get sent back, and a replacement packet to arrive, for a total of 240ms each way, or 480ms round trip, which is totally unacceptable. Let’s say that we try to improve this as follows:
Alice ---- 40ms ---- proxy server ---- 40 ms ---- Bob
If we’re now willing to assume that the packet rate is low enough that a single packet might need a rerequest on one end or the other but not both, our resends will add 80ms each way, for a total round trip time of 320ms, which is barely acceptable. Unfortunately that isn’t a terribly solid improvement, and it’s hard to colocate servers in the exact middle of the Atlantic or Pacific oceans, but it’s possible to do better:
Alice ---- 10ms ---- proxy server ---- 60ms ---- proxy server ---- 10ms ---- Bob
Now things look much better. Even with resends on both ends, the one way delay has only gone up to 120ms, for a total round trip time of 240ms, or 200ms if we want to be just a little aggressive, which is pretty good. And even better, the amount of latency this technique adds doesn’t increase as the distance between the peers increases, so long as there are proxy servers close to either end.