Detecting and Displaying Merge Conflicts
The last theory post on merge algorithms before a working prototype
As I’ve mentioned previously if you want eventually consistent version control, meaning whatever order you merge things together has no impact on the final result, you not only need to have a very history aware merging algorithm, you also need canonical ordering of the lines. This cleanly dodges around the biggest issue in version control, which is what should you do when one person merges AXB and AYB as AXYB and another person merges them together as AYXB and then you try to merge both of those together. None of the available options are good, so you have to keep it from ever happening in the first place. Both people need to be shown AXYB as the order of lines in the merge conflict (or the other order as long as it’s consistent) and that way if either of them decided to change it to AYXB then that was a proactive change made afterwards and is not only a winner of the later meta-merge conflict, there isn’t even a conflict at all, it merges cleanly.
This flies in the face of how UX normally works on merge conflicts, which orders the conflicting regions by whether they’re ‘local’ or ‘remote’. How to do order better is an involved subject which I’ve covered thoroughly in older posts and won’t rehash here, but conflict UX I want to talk about more. Since the order of lines and whether they should be included if everything is smashed together blindly is assumed to be handled, that creates a question of how to detect and present conflicts. What’s going to be needed is a way of marking particular lines as conflicts and figuring out what should be marked. There should be some format of special lines similar to the conflict markers people are already familiar with as a way of presenting them to users in files. That format should include a way of saying which of the two sides individual lines came from.
The general idea is to determine ‘which side each line came from’ and if two lines whose ancestry are different are ‘too close together’ then they’re both marked as being in conflict. If successive lines have the same ancestry then if one of them is in conflict it taints the others. The simplest approach is that a single line of code which is present on both sides ends regions of conflict. Arguably it should be more than one line to declare peace, or that empty or whitespace only lines shouldn’t count towards it. I’m going to assume the simplest approach for a proof of concept.
An important case is when Alice adds a line to a function and Bob deletes the entire function. Obviously that should somehow be presented as a conflict but deleted lines are crucial to it. For that reason there needs to be some way of showing deleted lines in the conflict, definitely with proper annotations around them and possibly with the individual deleted lines commented out.
To detect conflicts each line is marked as ‘peaceful’, ‘skip’, ‘Alice won’, ‘Bob won’ or ‘both won‘. Once all lines are marked then the ones which are marked skip are, well, skipped. Other lines which border lines with a different marking which is not peaceful are marked as in conflict. Finally tainting is spread to neighboring lines which have the same state. Deleted lines are only presented to the user if they’re in conflict.
What to do in each case is best presented as a laundry list, so here goes. Each case is final-Alice-Bob.
missing missing missing: skip
missing missing present: Alice
missing present missing: Bob
missing present present: both (this is an unusual case but it can happen)
present missing missing: both (similar to the previous case)
present missing present: Bob
present present missing: Alice
present present present: peaceful
That seems to handle all the edge cases properly and covers the last of the theoretical details I needed to work out.
When a user resolves a conflict and does a commit it should first throw an error if conflict markers weren’t removed, then should assume the user edited the clean merge they would have seen if each line were presented verbatim without checking for conflicts. When doing a diff between the complete weave and the user’s final file version it should probably more heavily weight lines which were present than lines which were deleted but I’m not sure what the best way of doing that is and will probably make a prototype which doesn’t have any such heuristic.