Category Archives: Uncategorized

Acute sets

An interesting math question is: for a given number of dimensions, how many different points can be placed in that many dimensions such that the angle formed by every triplet of them is acute? This paper shows a dramatic new result which makes an explicit construction of (21/2)d which is about 1.41d. I’ve managed to improve this to (61/5)d, which is about 1.43d.

Those of you not familiar with it already should read about dot products before proceeding. The important points for this result are: If the dot product is positive the angle is acute. The dot product across multiple dimensions is the sum of the products for each of the dimensions individually, and if one of the vector lengths in a dot product is very small then the dot product is very small as well.

My result is based on a recurrence which adds five dimensions and multiplies the number of points by six. It’s done like so:

Some explanation for this diagram is in order. The points a, b, c, p, q, and r all correspond to the same old point. The leftmost picture is the first and second new dimensions, the middle one the third and fourth, and the right one the fifth. The value ε1 is very small compared to all values which previously appeared and all normal functions on them. The value ε2 is very small compared to ε1 and this continues to ε5. (Just pretend I drew those as epsilons instead of es. My diagramming skills are limited.)

A more numeric way of defining the values for the new dimensions is as follows:

point 1 2 3 4 5
a cos(θ1) * ε1 sin(θ1) * ε1 cos(θ3) * ε5 sin(θ3) * ε5 ε4
b cos(θ1) * ε1 sin(θ1) * ε1 -cos(θ3) * ε5 -sin(θ3) * ε5 4
c cos(θ1) * ε1 * (1-ε3) sin(θ1) * ε1 * (1-ε3) cos(θ2) * ε2 sin(θ2) * ε2 0
p -cos(θ1) * ε1 -sin(θ1) * ε1 cos(θ5) * ε5 sin(θ5) * ε5 ε4
q -cos(θ1) * ε1 -sin(θ1) * ε1 -cos(θ5) * ε5 -sin(θ5) * ε5 4
r -cos(θ1) * ε1 * (1-ε3) -sin(θ1) * ε1 * (1-ε3) cos(θ4) * ε2 sin(θ4) * ε2 0

Those thetas are all angles. Their exact values don’t matter much, but they need to not be too close to each other compared to ε. Also θ2 and θ4 need to be between 0 and π/2 and not too close to the ends of that range.

Each triplet of points either all correspond to different old points, or two correspond to the same old point and another one to a second, or all three correspond to the same old point. In the case where all three correspond to different old points, their angle will be acute because the positions have only gotten changed by epsilon. The other cases can be straightforwardly enumerated. Because there are only two types of points there are only two cases for the singular external point to consider, labelled x and y in the diagram above. Because the last value of x can be positive or negative there are two positions labelled for it.

To check if each angle is acute, the dot product is calculated and checked to see if it’s positive. Rather than calculate these values exactly, it’s sufficient to check what the smallest epsilon each value is multiplied by and whether it’s positive or possibly negative. If the sum of the different dimensions is a positive larger epsilon (that is, one with a smaller subscript) then the dot product is positive and the angle is acute. Because many of the cases are very similar they can be consolidated down, so multiple endpoints are listed in some of the cases of the chart of all cases below. In cases where a value in one set of dimensions overwhelms all values which come after it, those values are skipped.

vertex end point end point 1 & 2 3 & 4 5
a b x 0 ε5 0
a b y 0 5 ε4
a c x ε3 5
a c y ε3 ε2
a pqr xy ε1
c ab x 3 ε2
c ab y 3 ε2
c pqr xy ε1
a b cr 0 5 ε4
a b pq 0 ε5 0
a c pq ε3 5
a pqr pqr ε1
c abpq bpq 3 2
c abpq r 3 2

Note that the extra restrictions on θ2 and θ4 are important in the CAY case (where A is the vertex).

In case you’re wondering how I came up with this: I can’t visualize in five dimensions any better than you can, but I’m pretty good at visualizing in three, so I worked out a 3 dimensions to 3 points recurrence which almost works. That’s dimensions three through five here. I put one of the recurrences from the previous work for dimensions one and two and moved the c and r points ‘down’ in the first two dimensions to fix the case which was broken, specifically CAX.

I’m quite certain this result can be improved on, although it’s hard for my poor little human brain to work out the more complicated cases. My guess is that it trends towards the absolute maximum of the nth root of n, which is at e (that’s the root of the natural log, not a miswritten epsilon). I conjecture that this is tight, so no matter how small ε is, (e1/e – ε)d < acute(d) < (e1/e + ε)d for sufficiently large d.

Gallus and Simo Debate Whether the Bitcoin Block Size Limit Should be Increased

Gallus: Bitcoin needs an increase in the block chain max size to avert disaster. If the block size limit isn’t increased soon, the limit will be hit and disaster will ensue.

Simo: Current blocks are nowhere near the size limit.

Gallus: At the current rate of growth of transactions, we’ll get there soon.

Simo: Lightning Network can handle it!

Gallus: Lightning Network isn’t working yet, forces big transactions to happen when its timeouts sunset and requires a lot of complexity and endpoint diligence.

Simo: Just wait for the new paper! With a relative timelock opcode everything nets out until the settlement between two counterparties exceeds the amount they initially deposited no matter how long their relationship has gone on, and the diligence can be outsourced to third parties.

Besides, most of the current transactions might be garbage anyway and the right way to handle everything is with transaction fee increases

Gallus: You’re just speculating about how much is garbage, and transaction fees destroy zeroconf.

Simo: If you’re more conservative about what sort of malleability you accept then zeroconf works, uh, about as well as it does now. Specifically, if you disallow changes to the outputs other than decreasing payouts and thus implicitly increasing the transaction fee then there’s little opportunity to defraud via the usual channels. Zeroconf is still a profoundly bad idea though. If it ever became widespread, then that would inevitably lead to the creation of a darknet where alternative transactions could get posted which included kickbacks to the targets of previous mining rewards. There is no good counter to this. Zeroconf advocates should get over it.
Reiterating on transaction fees are the right way to handle everything, the current technique for avoiding denial of service by putting through lots of garbage transactions boils down to letting through larger transactions first, so anyone trying to make lots of transactions with a de minimis amount of cash fronted will have their amounts spread so thin that legitimate transactions will take priority. Since there are an average of 600 seconds in a block, and blocks can handle about 4 transactions per second, if we add a factor of 10 assuming that the attacker wants to keep transactions from going through even when blocks happen to be bigger, then if an attacker wanted to prevent any transactions of less than $10 from going through, they’d have to front $10 * 600 * 4 * 10 or about $25,000 to keep that from happening. And that’s fronted, not spent, the attacker can always sell their coins off later (although their value would likely have been badly damaged in the interim). Even adding significantly to this wouldn’t make the security margin particularly good. Transaction fees really are needed.

Gallus: The wallet codebases are poorly written and maintained and can’t be realistically expected to be made to handle real transaction fees.

Simo: If they really needed to then wallets would fix their shit. This is Bitcoin, where the whole point is supposed to be that all the endpoint mutually enforce security from everybody else. If you’re concerned about supporting code which is so shitty it shouldn’t have existed in the first place you should go work for Microsoft. Besides, even a busted wallet can have its keys extracted and put into a wallet which isn’t busted.

Gallus: There aren’t any good way to handle transaction fees.

Simo: Receiver pays (where a new transaction is created spending the output of an old one so it can pay the fee for both of them to go through) works well, even when some wallets are busted, as does the previously mentioned conservative approach to allowing malleability.

Gallus: Receiver pays doubles the number of transactions, making the block size limit problem worse.

Simo: If a new opcode were added requiring that a particular thing *not* be in the current utxo set, then that would allow for multiple receiver pays to be bundled together with each of them using few bits, be very robust against history reorgs, and only require a single lookup in the already necessary utxo database on miners.

Gallus: This is all very complicated to handle.

Simo: It’s just software. See my earlier comment about working for Microsoft.

We need to find out how much those transactions are really worth to be able to use good judgement, and hitting the limit is the only way to find out.

Gallus: You’re proposing invoking disaster just to gather some academic data!

Simo: If anything we should be going the other way, artificially forcing transaction fees up in advance of needing to by limiting block sizes below the requirement. Then if there were problems we could let the limit go back to normal and spend some time fixing the problems without creating a compatibility problem. Besides, we don’t even know if any real damage would be done by hitting limits, because without a demonstrated willingness to pay transaction fees, even temporarily, we have little evidence that bitcoin transactions are creating any real value. Core developers favor doing an experiment like this more than they favor increasing the block size limit.

Getting back to the main point, Increasing the max block size limit would be ruinous to the bitcoin ecosystem, with vastly fewer full nodes being run.

Gallus: The rate of bandwidth increase is exponential, and there will be plenty.

Simo: As of today, the amount of bandwidth to run a full node is a significant disincentive to running one. The start-up time to get the current blockchain history when starting a new one makes the problem much worse than the ongoing rate of download. The rate of growth of bandwidth is much slower than Moore’s law is for computational power, and if you assume that everybody has mass quantities of bandwidth it would be much better to use it to have wallets run full nodes and retire SPV.

Besides, increasing the block size is a hard fork, which is unlikely to even happen. At best it would result in two different chains. The miners, who hardly even respond to developers’s entreaties about urgent issues, have little reason to go for it because the whole goal is to avoid or reduce transaction fees, which cuts directly into their bottom line, and demonstrating an ability to make a backwards incompatible change undermines the claim that Bitcoin is a truly decentralized system.

Gallus: The new fork can be merge-mined along with the classic fork. Miners will do whatever is of marginal value to them, and if they can mine both at once at no extra cost they will.

Simo: With only partial miner cooperation the new fork would have substantially less security, and the two of them coexisting would be a disaster of indeterminate state of coins which were spent on one fork but not the other, causing far worse problems for wallets than transaction fees would.

Eventually the only mining incentive left will be transaction fees. If transaction fees aren’t made significant by then, disaster will ensue.

Gallus: Mining rewards could be changed as well.

Simo: Increasing mining fees would be a yet even more outrageous hard fork than increasing the block size limit. It would cause extraordinary amounts of real world waste for no proven value. Our goal is eventually to make Bitcoin be more than a cryptographic curiosity and an exercise in the platonic ideal of marxist value creation, it should provide some service of value. If it can’t do that, it deserves to fail and be abandoned.

A crazy idea for proofs of work

The recent trend in proofs of work has been to use password hashing schemes. I gave a construction for these in my last post which I’m fond of, but there’s a problem that password hashing schemes aren’t quite a perfect match for the problem. The fundamental issue is that while password hashing schemes succeed in requiring both CPU and memory to do, they unfortunately also require both CPU and memory to verify, unlike simple partial collisions of hash functions which require essentially no memory or CPU to verify. In the case of password hashing this cost of verification is considered a feature. In the case of proofs of work it most definitely is not. This problem compounds in that because there’s a limit to how long the proofs can take to verify, there’s a limit to how much memory they can require as well.

The question becomes: Is there something which takes lots of memory to do, but can be quickly verified? Here’s my crazy idea: use collisions. Instead of a single value which is hashed, the result of the work is two values which are hashed and their results are xored together, and the result should end in an appropriate number of zeros to demonstrate an appropriate amount of work. (A good argument can be made for using sum rather than xor as the combination function. I’d like to hear peoples’s thoughts on this, but for the sake of convenience in this post will assume that it’s xor.)

The fastest way to find duplicates in a list is to sort it, because there are n*n possible matches for n*log(n) time to sort. Using more CPU but less memory by having things be recalculated in multiple places takes a massive hit from the asymptotics. Unfortunately while sorting requires lots of memory, it doesn’t require unpredictable memory lookups, but I have no idea what could be done to require that. Also, a reasonable amount of fast access memory and a few clever tricks can make there only be 2 or 3 passes through main memory instead of log(n). Still, the amount of memory used is basically unavoidable, and verification is essentially instantaneous.

This approach can suffer from working too well. There’s essentially no limit to how much memory can be used, which gives an advantage to anyone with resources to set up a huge amount of storage to be sorted over. This can be corrected by putting a hard limit on the memory used based on structure. For example, a proof of work can consist instead of two completely different strings to get hashed be of a base string and two different extra strings of fixed length – 38 bits sounds reasonable – which are hashed in with the base string for the two values which are then xored together. Adding a bit to the extra strings will double the amount of memory/storage necessary to run the operation. Limiting the bits of difference will of course also reduce the sizes of the proofs of work.

In our case with 38 variable bits, the number of possible possible combinations of two is 2^38 * 2^38, or 2^76, so a typical run through would find a combination which ends in 76 zeros, and each additional zero beyond that results in on average twice as many run throughs having to be done.

To do this, every possible input to the hash function has to be tried separately, which can use a fair amount of CPU. One way to reduce CPU load is to switch from sum of two to sum of four. In our previous example that means instead of 38 bits for each of two values it would be 19 bits for each of four values, and all four of those are used to hash and then the hashes are xored together. The fastest way to solve this one is to calculate the hashes of all 2^19 possible values, then make a list of all xors of any two of them and sort looking for duplicates. You might wonder why not go for xoring even more together. That’s because other algorithms start applying once the number of things gets large, and the CPU load has been reduced so much going to four already that it doesn’t matter. You might also wonder why not parsimoniously xor three values instead of four. That’s because the savings in terms of size of proofs of work would be extremely small, and the 3SUM problem has interesting properties which make me not trust it.

How well this whole system works is of course an empirical question, and one I’m not sure of. It may be that it can be effectively accelerated in hardware, for example by using a hardware shell sort which correctly sorts most but not all of the time. That would only result in speeding up the last bunch of sorting though, and it may be that a regular CPU can do that so fast that it will be mostly waiting for memory (or disk!) anyway. It’s certainly the case that any hardware or implementation speedup of this would be much more interesting than the sorts of acceleration done for simple partial collisions of hashes, and it’s also possible that the acceleration would come from high throughput memory or disk, which is at least a useful thing to subsidize production of. It’s even possible that this approach works well enough that it’s the holy grail of proofs of work, which are downright green because the CPU is mostly sitting around bored waiting for something else to finish fetching data from systems which are already well optimized to do that, and custom hardware can’t improve performance in any meaningful way. People’s thoughts on this, and even more importantly test results, would be greatly appreciated.

A mode for password hashing

I’ve been working a bit on password hashing lately, and have come up with a mode which seems to be so simple and fundamental that I’m surprised that it isn’t a standard construct. Or at least it doesn’t appear to be. I’ve looked through some, but not all, of the password hashing competition candidates and to the best of my ability to read them they appear to take slightly different approaches to the problem. Analysis of what’s presented here and comparisons to other work which has been done would be much appreciated.

The goal of password hashing is to make something which is no easier to do on custom hardware than it is on a general purpose CPU. To that end, you’d like an operation which requires that cache memory be used extensively, and that the order in which cache memory is used is unpredictable. You’d also like to use a core operation which has the same property. Going with AES128 is a reasonable bet, because it’s already accelerated on common hardware.

The basic idea behind this construction is to have bulk memory, which will be accessed pseudorandomly, and an internal state which consisting of at least a reference to some part of bulk memory. Everything is initially zeroed out. There’s a reversible operation based on the data being hashed which munges the internal state and the part of bulk memory referred to, resulting in the pointer referring to a new pseudorandomly selected block of memory. This operation is repeated some number of times until enough work has been done, and then output is extracted. It’s reasonable to set the number of rounds to be be double the number of blocks stored in bulk memory. It might be okay to go far lower, but that feels sketchy. In particular, going lower might result in custom hardware being able to effectively pipeline the next block being uninitialized (which might be able to produce a small speedup on a general CPU anyway, further research needed).

For highly convoluted and technical reasons it appears to be that the internal state consisting of just a reference isn’t enough. It also appears to be that having an internal state consisting of a single block isn’t enough. The construction I’ve come up with requires the internal state be a single block plus another reference. It was surprisingly difficult to come up with the construction, but it’s trivial to show that it’s reversible and easy to intuit that it’s secure if the underlying primitive is secure. It might be an interesting exercise to prove that, but I don’t know how such proofs work.

Anyway, on to the actual construction.

There’s a difficulty factor, k, which is a parameter to the whole thing. There’s also an encryption function used, probably AES128, resulting in a block size of 128 bits. Bulk memory is 2^k * blocksize in size. Internal state is one block of length blocksize, plus a backpointer consisting of k bits.

Every round operates on the internal block and the external block referred to by the last k bits of the internal block. Transforms are done as follows:

The new value of the backpointer is the last k bits of the old value of the internal block.

The new value of the external block is old value of the back pointer concatenated to everything but the last k bits of the old value of the internal block.

The new value of the internal block is the old value of the internal block xored with the old value of the external block and encrypted.

The extracted value is the final state of the internal block.

That’s the entire construction. It’s so simple and elegant that I’m surprised that it isn’t a standard construction.

One might protest that the input and output of this construction using AES128 are 128 bits and for password hashing you want them to be 256 bits. That can be fixed straightforwardly. From the input, set the AES key to be the first 128 bits of the input and before every encryption xor the block with the last 128 bits of the input. For the output, take the 128 bits which are the direct output of the hashing process and encrypt those using the same encryption function used everywhere else to get the next 128 bits. It feels a bit like the output should be xored by the original key to make it a true hash function and remove all trace of reversibility from it, but that feels a bit unnecessary after all that junk done in the middle. It would be good to have an opinion from someone who knows these things better than I.Update:I figured it out. The first 128 bits of the output are the final state of the internal block xored with the part of the key which is used for xoring (on the general principle that the way you turn an encryption function into a hash function is by encrypting and then xoring with the original key). It’s then encrypted (without a pre-xor, obviously) and finally xored one last time to produce the last 128 bits of the output.

Here’s an illustration of the round operation, followed by the steps for undoing, which I didn’t describe in text but are straightforward: IMG_0692

An attack on the timestamp semantics of Bitcoin

(This is a semantic layer attack on Bitcoin, based on what’s given in the immediately available public docs. The standard implementations may already be resistant to it. More on this at the end. Update: They already were! Take this post as an explanation of why the thing which I say ‘may even be in the standard implementation already’ is, in fact, necessary.)

Bitcoin mostly manages to avoid any nondeterministic calculations in evaluating the validity of a blockchain, but there’s one place where it doesn’t quite manage to do that: timestamps. To keep timestamps from getting too far off, two sanity checks are put on them:

  • Mining operations with timestamps more than two hours in the future are ignored. This is the one and only place in Bitcoin where something nondeterministic is used to calculate whether a part of the blockchain is valid. Obviously its results can change over time.
  • A mining operation whose timestamp is less than the median of the previous eleven is deemed invalid. (Presumably there’s some sanity check at very beginning of the block chain as well, probably a hard coded beginning of mining.) This is a deterministic calculation, but as I’ll get to later, it’s based on potentially unreliable earlier timestamps.

The one and only place where timestamps are used is in resetting the difficulty of doing a mining operation. If one were to use timestamps far into the future, one could make mining very easy in the next round and collect lots of coin for oneself. That doesn’t work because first of all, whoever’s receiving coin in a transaction won’t accept things which claim to happen in the distant future, and second of all when alternative blockchains are compared the one which wins is the one which has had more work done not the one which has more mining operations completed. This is the reason for that.

With the current heuristics for deciding what timestamps to accept whoever sets the very last one, which directly determines what mining difficulty is set to, has discretion to set it to between about an hour ago (Mining operations take about ten minutes so the median of the last eleven was about an hour ago) and two hours in the future (directly enforced, although you probably need to set it to a few seconds shy to be on the safe side because there really is clock skew.) Setting it to later will make mining slightly easier in the next cycle, but that will be offset by mining being harder in the cycle after that. It doesn’t completely even out though. If the last timestamps are alternately set as far into the future and as far into the past as possible, the overall amount of mining will be slightly greater than it would be otherwise. This is a very slight effect though and benefits all miners equally not just the ‘attacker’, so I think it isn’t worth worrying about. Maintainers of Bitcoin implementations may wish to add this trick. Knock yourselves out.

Now on to the actual attack.

The two possible attacks are when someone who controls a significant fraction of mining either consistently sets their clocks as far into the past or as far into the future as possible. Setting into the past doesn’t cause any damage, but into the future is a lot more interesting. As it happens, the two hours which timestamps can be set in the future is greater than the expected amount of time since the median of the last eleven timestamps, and eleven isn’t all that large a number, so an attacker with much less than half the mining capacity can set all their timestamps for two hours in the future and eventually get everyone else’s mining operations to get rejected by sanity check on timestamps. At first this will only have the effect of denying everyone else’s mining without increasing the amount of mining of the attacker, which might have the effect of increasing Bitcoin price in the short run by reducing supply but not otherwise be particularly beneficial to the attacker, but if they managed to keep it up for an entire setting of proof of work cycle (and, uh, good luck with that) then the difficulty of work would be reduced to what the attacker could produce and they’d be free to collect all mining rewards for themselves while everyone else’s attempts at mining got rejected.

If an attacker has about 15% of all mining capacity this attack would take about a week on average to pull off. It’s randomized though, so if they’ve spent time and still haven’t succeeded then the expected about of time for a hijack to happen remains the same. If they have 10% it’s about two months. Below that it basically stops working. Another 2-4 weeks is necessary for a mining difficulty reset to happen, depending on where in the cycle the hijack happens. I calculated these values using randomized simulations. An attacker with 20% of capacity would take an average of two days, 30% about half a day.

One could mitigate this attack by reducing the amount into the future which timestamps are allowed to be, from 2 hours to something less than 1 hour. While this sounds reasonable, it would require universal acceptance to work properly, which would be very hard to do, and I’m loathe to advocate that people change a clear standard of normal expected behavior.

The much better and more immediate fix is that miners should check what the earliest acceptable time for their next mining operation is, and if that time is somewhere in the future they use that for their timestamp instead of the actual current time. This completely fixes the problem immediately and is a totally reasonable sanity check in its own right, in case the local time is skewed heavily into the past. It may even be in the standard implementation already. Anyone maintaining Bitcoin mining software should make sure that theirs behaves in this way.

Snapchat hasn’t been hacked, it just sucks

Snapchat’s recent ‘hack’ is a curious creature, because it is (mostly) a direct consequence of the semantics which Snapchat reveals in its UI to every user. Less of a security violation and more like pointing out that license plate numbers are a privacy issue because everybody can read them (with one exception, which I’ll get to). But unlike license plates, Snapchat can change their semantics whenever they want, and should do so. Let me explain.

When you first sign up for snapchat it pulls in your entire phone book and lets you add them to your friends list (Note to everyone else making new messaging apps: Peoples’s list of real friends is still their phone contacts). This requires no permission from the people you’re adding. Not only does it reveal to you their phone numbers, it also shows their Snapchat usernames, and lets you look at their ‘best friends’ list. That last one is especially curious. Best friends is the list of people you’ve been you’ve been chatting with most recently. It defaults to having 3 spots, with the option to change it to 5 or 7, which nobody does, and there’s no obvious way of removing it completely. This is the bizarre irony of Snapchat’s privacy features. It meticulously destroys all images you’re trading with other people immediately, but lets anyone who has your phone number see that you’ve been doing it with hornyslut6969.
Allowing that much information to be seen without a user’s permission is especially bizarre in Snapchat because it has such a strong bidirectional concept of friendship. The default is to only allow people you’ve added to send you messages, and most people leave that. It even lets you know that a messages to someone is ‘pending’ if you send it prior to them adding you back, meaning that they’ll get it when (or more likely if) they add you. (The UI really doesn’t make clear what ‘pending’ means. Snapchat feels like it’s succeeded more by accident than design.)

One could argue that the person whose information is being viewed can at least see that someone else has added them if the standard UI is being used, but there’s no proactive notifications of that and the viewer can simply remove the person before they notice. Aside from which, I’m not a fan of privacy through user interface obfuscation.

The ‘leak’ that we have now is an association of phone numbers, snapchat logins, and approximate geographic locales. The first two parts, as I’ve said above, are already easy to look up, by design. But the locale information isn’t in any obvious place in the UI, and it being available via the API should be viewed as a real security issue.

So what should Snapchat do to fix the problem? As I said at the beginning, they need to change their actual semantics, including their basic UI. The logical way to do this is by making it so that your personal information is only sent to people you add as friends. This wouldn’t be an impediment to friend relationships getting formed, because when you add someone it tells them and encourages them to add you back. It also wouldn’t cause any UI confusion, because when you’ve added someone the main human readable name shown is the one pulled from your local phone book, which of course identifies them just fine. All sensitive info, including the user’s snapchat id and their best friends, should be hidden until and unless they reciprocate the friend add. I don’t think the fact that the person has a Snapchat account at all is a sensitive piece of information.

Undoubtedly this would be a significant amount of work for Snapchat to do, but given the recent bad PR, and how generally icky the information leaks are to begin with, it would be clearly worth it.

The Case of the Missing Remote Reflog Command

Today we have an example of a massive disaster caused by push -f. Much of the discussion is less than productive, because it violates a fundamental tenet of user interaction design: You are not allowed to call the user stupid. If the user did something wrong, the tool either steered them in the direction of doing something they shouldn’t have, or didn’t allow them to undo something they wanted to undo.

In the search for ultimate rather than proximate solutions to problems like this, it’s important to remember that the software is not immovable. All aspects of how a tool works, in this case both the underlying functionality of git and the process flow on github, are in principle subject to change. Obviously reinventing the tool from scratch is a bad idea: the solution should require as little new development as possible, and ideally shouldn’t break anything which already exists, but all approaches should be considered.

Almost all the discussion of solutions is focusing on a feature request to github to allow admins to disallow push -f. That’s a reasonable enough approach, but it’s being emphasized because it’s the approach which is made possible by the local administration tools available today, not because it’s the best approach. Git’s push -f is a very dangerous command whose use is encouraged by standard git methodologies, but it’s also a very powerful and useful tool, so blocking its use comes with significant downsides.

There are means by which push -f could be made much less dangerous, and more convenient and useful, but that’s an involved discussion I’ll save for another day. What I’d like encourage as the most immediate solution to this sort of problem is the alternate approach, where the user is given the ability to undo a mistake.

If you’re running git locally, there already is a facility for doing this, albeit not a very good one: The reflog command. What there should be is a remote (and better) version of the reflog command, which gives access to the history of every change to head which has been made, including when it was done and which user did it. Naturally this should be paired with another command for obliterating the history of past heads, because there are times when it makes sense to remove all trace of something which happened (for example, when someone added a password file) but completely obliterating history shouldn’t be the default, and it certainly shouldn’t be conflated with the functionality which push -f is often used for. The details of how reflog works feel accidental rather than designed – it doesn’t keep everything, nor does it allow you to immediately completely obliterate old things. Such functionality is important enough that it deserves to be designed on purpose and put into the standard toolset.

(A funny aside, I’m told the command is pronounced ref-log, short for ‘reference log’. I always assumed it was ‘re-flog’ because looking at old history is beating a dead horse.)

TCP Sucks

Nick Weaver gives me a shout-out.

there have been a couple of cases where the application vendors concluded they were causing too much damage and therefore started making changes. BitTorrent is the classic example. It is shifting to delay-based congestion control specifically to: (a) be friendlier to TCP because most of the data carried by BitTorrent really is lower-priority stuff; and (b) mitigate the “you can’t run BitTorrent and Warcraft at the same time” problem. So, there’s some hope.

It’s true. We occasionally take a break from drinking moonshine and shooting beer bottles to do real engineering.

Of course, I’ve always used TCP using exactly the API it provides, and even before I understood how TCP worked under the hood gone through great pains to use the minimum number of TCP connections used to the number which will reliably saturate the net connection and provide good piece diffusion. If TCP doesn’t handle that well, it isn’t my fault.

Now the intelligentsia have a plan, called RED to how to fix the internet, because uTP (using the LEDBAT congestion control algorithm), coming from the likes of me, can’t be the real solution. (By the way, I’d like to thank Stanislav Shulanov for being the real brains behind uTP.) I don’t believe this is a good idea, for several reasons.

First is it’s just plain unproven. It’s been years since RED was proposed, and to date noone’s come up with something where they can say ‘go ahead and deploy this, it’s mature’ with a straight face. Given that very smart people have worked on this, it stands to reason that the problems are just plain hard.

Second is it ain’t gonna happen. Deploying RED involves upgrading routers. To rely on it requires upgrading the entire infrastructure of the internet. The marketing plan is that because router vendors are unwilling to say ‘has less memory!’ as a marketing tactic, maybe they’d be willing to say ‘drops more packets!’ instead. That seems implausible.

Finally, RED is in an apples-to-apples sense a much cruder technique than uTP. With classic internet routing, a router will either pass along a packet immediately if it can, or add it to a queue to send later if it can’t. If the queue has become full, it drops the packet. With RED the router will instead have some probability of dropping the packet based on the size of the queue, going up to 100% if the queue is full. (Yes I know there are other schemes where packets already on the queue are dropped, I’m going to view all those things as variants on the same basic principle.) Since TCP only uses dropped packets as a signal to back off, this uses early packet dropping as a way of giving some information to TCP stacks that they need to back off before the queue gets full. The only information in use here is the size of the queue and the size of the buffer, with the size of the buffer becoming increasingly irrelevant due to buffer bloat, making its value be essentially ‘far too big’. RED makes dropped packets convey a little more meaning by having statistical gradations instead of a full/not full signal. uTP by contrast uses one way delays to get information for when to back off, which allows it to get very precise information about the size of the queue with every packet, with no packet loss happening under normal circumstances. That’s simply more information. You could in fact implement a ‘pretend the router’s using RED’ algorithm for uTP, with no router upgrades necessary.

Given that uTP can be implemented today, by any application, with no upgrades to any internet hardware being necessary, and that it solves that whole bufferbloat/latency problem, I think we should view the end to end approach as the solution to bufferbloat and just forget about changing router behavior.

We’ve already rolled out uTP as the default transfer algorithm for BitTorrent, which has changed the behavior of the internet so much that it’s changed how and whether ISPs need to upgrade their infrastructure.

‘But game theory!’ the naysayers will say ‘Game theory says that TCP will always win!’ Narrowly speaking this is true. TCP as a big giant SUV is very good at playing chicken. Whenever TCP goes up against a congestion control algorithm which makes an actual attempt to not have a completely full buffer, TCP will always fill the buffer and crowd out the other one. Of course, it will then stick the end user with all the latency of a completely full buffer, regardless of how bloated the buffer is, sometimes going into the seconds. For the end user to complain about how big the buffer is would be like them complaining to their credit card company for offering too high of a limit. ‘You should have known I’d spend too much!’ The solution is for the end user to intervene, and tell all their applications to not be such pigs, and use uTP instead of TCP. Then they’ll have the same transfer rates they started out with, plus have low latency when browsing the web and teleconferencing, and not screw up their ISP when they’re doing bulk data transfers. Even within a regime of everyone using uTP, it’s possible to have bulk transfers take lower priority than more important data by giving them a lower target delay, say 50 milliseconds instead of 100 milliseconds.

If you really want to design routers to give more information to end nodes, the big problem for them to fix is the one where everyone is attempting to do congestion control based on one way delays, but noone can get an accurate base delay because the queue is always full, so one way delays are always exactly the same and it looks like there’s no queue but high non-congestive packet loss. The best way to solve that is for a router to notice when the queue has too much data in it for too long, and respond by summarily dropping all data in the queue. That will allow the next bunch of packets let through to establish accurate minimum one way delays, and everything will get fixed. Of course, I’ve never seen that proposed anywhere…

Engineering IP Telephony

Update: Microsoft has sent out a clarification that the only thing they’ve centralized is handshake negotiation, which is a good thing, but doesn’t include creating any of the features I’ve listed below. Consider this article to be about how you could do things better.

Skype (aka Microsoft) decided to use proxying for all connections not have their peer directory information be run on untrusted peers citing ‘security’. One might wonder how and if this improves security. The most immediate security benefit is that it keeps peers from being given the entire peer directory.accepting direct connections in from untrusted peers, which can possibly send exploits. Applications written in real languages don’t have such problems, but anything which does audio decoding is going to have to have a significant C component. On the flip side, compressed audio isn’t sanitized or reencoded by proxies, so exploits are still possible, but then again the server can easily add checks for new exploits and sanitize them and use them to develop an MO for spammers. Of course, the central server by having the ability to view everything can do a much better job of stopping non-security-related spam as well. Big brother protects you from spammers. It’s a fact of life.

The deeper advantage of going through a central proxy is that it avoids giving away IP addresses. If an attacker who runs a countrywide firewall wants to know who its dissidents are, it can place a call to a journalist known to be talking to dissidents to find out their IP address, then record all IP addresses within the country forming direct connections to the repeater’s IP address, and have a very short list of potential dissidents. While this attack is extremely crude, it’s about what most countries are able to pull off given their level of sophistication and huge amount of traffic they have to deal with, and idiot journalists have gotten people killed this way. Thanks, assholes. By proxying everything through their servers, Microsoft has basically set up a big single-hop anonymizer, which at least stops the very crude attack we’re worried about right now.

There are potential engineering advantages to going through proxies as well, although it’s doubtful that Microsoft is doing them yet. The engineering problem which completely dominates telephony is latency. The speed of light in a fiber optic cable around the earth’s circumference is about 200 milliseconds. By coincidence the generally accepted maximum acceptable round trip time for an audio call is the not much larger value of 330 milliseconds, which makes engineering a system which can handle phone calls from Argentina to Japan inherently difficult.

Despite the behavior of the telco industry, a 330ms round trip time is not a gold standard, it’s the absolute maximum above which people find it completely unacceptable. The telephony industry continues to pretend that it’s a gold standard anyway, letting call latencies get higher and higher with not a single company engineering a low-latency solution and bragging about it in their marketing materials, displaying an appalling lack of pride in one’s work, engineering prowess, and marketing acumen.

In any case, for the sake of argument I’m going to say that 330 ms is an ‘acceptable’ round trip time and 150 ms is a ‘good’ round trip time. Yes I know that for gaming anything above 100ms starts to suck, but people are more forgiving of audio calls.

Whenever you send data over the internet, the latencies add up as follows:

[box type="note"]congestion control delay -> upload queueing delay -> upload serialization delay -> propagation delay -> download serialization delay -> download queueing delay[/box]

In the case of audio, data is always sent as soon as it’s generated. Reducing the bitrate being an extreme measure which is only done rarely, and then it’s to another basically fixed bitrate, so there isn’t any of the sort of dynamic congestion control which TCP has, and no point in waiting to send anything.

Upload queueing delay is hopefully usually zero. Unfortunately many routers have queues whose length can be measured in seconds, which TCP will happily completely fill continuously. Obviously if there’s any serious TCP-based cross traffic your audio calls are basically hosed, so we’ll just assume that there isn’t any. What size upload queues should be is a controversial topic which I’ll have a lot to say about at another time, for now I’ll just say that getting them below amounts which can potentially mess up audio calls is completely unworkable at current broadband connection rates, and anything over 500ms is completely broken.

Upload serialization delay is typically just a few milliseconds and not a major issue.

Propagation delay is dominated by the aforementioned speed of light problem. As a reasonable example, the one way latency between San Francisco and Amsterdam can’t be below 40ms just because of the laws of physics, and if you can get 80ms from one home net connection in the US to another one in Europe you’re doing very well. (If anyone can point out maps showing latencies between different parts of the internet which don’t suck I’d appreciate it. Most of the easy to find ones are focused on latencies between different ISPs in the same location, although some of the latencies they report make me want to cry.)

Download serialization delay is generally just a few milliseconds and not a major issue.

Download queueing delay generally can’t be caused by uploads from a peer net connection, because consumer download rates are much higher than upload rates, but if the downlink is saturated by TCP-based cross traffic you’re hosed, for the same reasons as with upload queueing delay.

That covers the basics of how the internet works from the standpoint of a layer 5 protocol developer. My apologies to people who work at the lower layers.

Where this becomes interesting for internet telephony is when it interacts with packet loss. On well behaved net connections the only source of packet loss should be packets getting dropped because queues are full, at which point phone calls are already hosed because of latency. Unfortunately lots of people have cross traffic on their wireless, or move their laptop too far from their base station, or have a shitty ISP, and have a lot of non-congestive packet loss. For the sake of argument, I’ll assume that this is a problem worth solving.

When data is lost, the only things you can do are to either have a skip, or send a rerequest. If you have a policy of sending rerequests, then you have to delay all traffic by the worst delay you can incur with a rerequest, because dynamically changing the latency produces a sucky end user experience. Let’s say that we have a simple direct connection which looks like this:

Alice ---- 80ms ---- Bob

In this case the direct round trip time will be 160ms, which is good. The speed of sound is about one foot per millisecond, so this will be about equivalent to yelling at someone 80 feet away from you. Unfortunately if you add in rerequests, you have to wait for a packet to (not) get there, a rerequest to get sent back, and a replacement packet to arrive, for a total of 240ms each way, or 480ms round trip, which is totally unacceptable. Let’s say that we try to improve this as follows:

Alice ---- 40ms ---- proxy server ---- 40 ms ---- Bob

If we’re now willing to assume that the packet rate is low enough that a single packet might need a rerequest on one end or the other but not both, our resends will add 80ms each way, for a total round trip time of 320ms, which is barely acceptable. Unfortunately that isn’t a terribly solid improvement, and it’s hard to colocate servers in the exact middle of the Atlantic or Pacific oceans, but it’s possible to do better:

Alice ---- 10ms ---- proxy server ---- 60ms ---- proxy server ---- 10ms ---- Bob

Now things look much better. Even with resends on both ends, the one way delay has only gone up to 120ms, for a total round trip time of 240ms, or 200ms if we want to be just a little aggressive, which is pretty good. And even better, the amount of latency this technique adds doesn’t increase as the distance between the peers increases, so long as there are proxy servers close to either end.