More on Version Control
This may have some legs
Surprisingly and happily my last post on version control got picked up by Hacker News and got a lot of views. Thanks everybody who engaged with it, and welcome new subscribers. I put a ridiculous amount of work into these things and it’s nice when there’s something to show for it.
Something I didn’t quite realize when writing that last post is that in addition to supporting ‘safe rebase’ by picking one of the parents to be the ‘primary’ one, it’s possible to support ‘safe squash’ by picking a further back ancestor to be the ‘primary’ one. The advantage of this approach is that it gives you strictly more information than the Git approach. If you make the safe versions of blame/history always follow the primary path then if you perform all the same commands in the same order with both implementations then the outputs can be made to look nearly identical, with some caveats about subtle edge cases where the safe versions are behaving more reasonably. But the safe versions will still remember the full history, which both lets you look into what actually happened and gives you a lot less footguns related to pulling in something which was already squashed or rebased. The unsafe versions of these things literally throw out history and replace it with a fiction that whoever did the final operation wrote everything, or that the original author wrote something possibly very divergent from what they actually wrote.
Git is very simple, reliable, and versatile, but it isn’t very functional. It ‘supports’ squash and rebase the way writing with a pen and paper ‘supports’ inline editing. It implicitly makes humans do a lot of the version control system’s job. It’s held on because the more sophisticated version control systems haven’t had enough new functionality to compensate for their implicit reductions in versatility and reliability. My goal is to provide the foundation of something which tells a compelling enough story to be worth switching to. I think the case is good: At the minor cost of committing to diffs at commit time you can have safer versions of squash and rebase which Just Work, plus a better version of local undo for the occasional times when you encounter that nightmare and a pretty good version of cherry-picking as a bonus.
‘Committing to diffs at commit time’ is a subtle point. From a behavioral standpoint it’s a clear win. But it does create some implementation risk. Any such implementation needs a core of functionality which is is very well tested and audited and only updated extremely conservatively. This is why I made my demo implementation as simple as possible within the constraints of eventual consistency. Other systems which claim to do this I couldn’t understand their technical docs, much less develop intuitions of how they behave. With the ‘conflicts are updates which happen too close together’ approach I can intuitively reason about the behavior.
Some people found the ‘left’ and ‘right’ monikers confusing. Those are entirely advisory and can be replaced with any info you about the branch name it came from and whether it’s local or remote, or even blame information.
The ‘anchoring’ algorithm for CRDTs has been invented multiple times by independent groups in the last few years. I find it heartening that it’s only been in the last few years because I feel I should have come up with it over twenty years ago. Oddly they don’t seem to have figured out the generation counting trick, which is something I did come up with over twenty years ago. Combining the two ideas is what allows for there to be no reference to commit ids in the history and have the entire algorithm be structural.
One implementation detail I didn’t get into is that in my demo implementation it doesn’t flag ‘conflict’ sections which only consist of deletions with no insertions on either side. I’ve discussed the semantics of this with people and it seems more often than not people view flagging those as excessively conservative and are okay with them clean merging to everything gone. This is no doubt something people can get into religious wars over and it would be very interesting for someone to collect actual data about it but in practice it will inevitably be a flag individual users can set to their liking and the argument is really about the default.
One interesting question about version control is whether it can ever merge together two branches which look the same into a version which looks like neither parent. The approach I’ve proposed avoids doing that in a lot of practical cases which blow up in your face pretty badly, but that doesn’t mean it never does it. For example if you start out with XaXbX and one branch goes XaXbX → aXbX → XX while the other branch goes XaXbX → XaXb → XX. Then when you merge them together the first branch has deleted the first X of three while the second branch has deleted the last X of three so they’ll clean merge into a single X. This is a fairly artificial example which it would be hard to have happen in practice even if X is a blank line, especially if you use a diff algorithm which tries hard to keep the place repeated lines get attached to be reliable and consistent. But if it were to happen it feels like the history has ‘earned’ this funny behavior and going down to a single X with no conflict really is the right thing to do, as counterintuitive as that may be before you’ve worked through this example.
There was some discussion about my comment at the bottom of the last post about the code being artisanal but the post not. The risk with AI is that the code is a spaghetti mess which nobody realizes because no human has ever read it. There’s clear benefit to artisanal coding, or at least human auditing. The risks of AI assisted writing are a lot less. Humans understand what the words say. The AI’s writing is a lot breezier but less interesting but for reference materials that’s probably what you want. I did repeatedly go through that entire last post and tell the AI things which were wrong and should be changed. The experience was much better than manually editing because the AI could include whatever I said more quickly, in a more appropriate location, and with better wordsmithing than if I’d done it by hand. I didn’t try to change the tone to not sound like AI because that would have been more work and because I think it’s funny to encourage AI doomsaying by making it look like AI can write posts like that. This post was written entirely artisanally. Hopefully the term ‘artisanal code’ becomes a thing.

