Synchronize eventually with gossip and pruning semi-lattice
This article was originally planned to be a retelling of sync9 and antimatter. It quickly got out of hand.
Quick retelling of the original algorithms
This set of algorithms is battle tested, unlike the ones I will propose below.
The synchronization algorithm: Antimatter
The author emphasizes that this is not a consensus algorithm like Raft.
The CRDT library: Sync9
Most important concept (similar to prunable semi-lattice): OT + CRDT = Time Machine
Now, onto the me going off-the-book and do hand-wavey math section. The claim is actually weaker than the implicit claim by Antimatter+Sync9, but I ended up knowing about Kind (type checker with Self Type) along the way, so the adventure is worth it to me. If reading this is worth it to you I don’t know. Maybe stick to the above tested algorithms.
Semi-lattice
If you have a “up” and partial ordering for a type (α
below), the type is semi-lattice.
Collaborative text editing is showcased in the sync9 demo. α
is a list of edits like Insert('a'@0), Delete(0:1)
. Two lists of edits can merge (the “up”).
Pruning Semi-lattice
It’s like morphing semi-lattice, yet not morphism. It’s partial morphism, which I now call “pruning”. The definition:
import Mathlib.Order.Basic
import Mathlib.Order.Lattice
def larger [PartialOrder α] (a : α) (s : Set α) := ∃ b ∈ s, a ≥ b
class PrunableSemilatticeSup (α : Type u) extends SemilatticeSup α where
PrunedType : Set α -> Type u
-- PrunedType can be pruned further
preserve_prune : (s : Set α) -> PrunableSemilatticeSup (PrunedType s)
prune : (s : Set α) -> α -> Option (PrunedType s)
-- nothing important is pruned away
correct : (s : Set α) -> ∀ a: α, larger a s → (prune s a).isSome = true
-- preserve local morphology
preserve_morphology : ∀ a b: α, (l : larger a s ∧ larger b s) → prune s (a ⊔ b) = (prune s a).get (correct s a l.left) ⊔ (prune s b).get (correct s b l.right)
(The above code doesn’t compile. mentioned later.)
I was a bit surprised to find out that the underlying data type changes at runtime. This doesn’t happen much in practice. The role of a type system is to make invalid states unrepresentable. Here, what is considered invalid changes at runtime, so the type changes at runtime as well. In this case, even Zig can’t help me.
Consensus Algorithm
Can we use raft here? Why Antimatter?
Raft:
- the total number of nodes is fixed
- has one leader
- uses log. allows arbitrary data type
sync9:
- the total number of nodes is unknown
- the author of every point is responsible collecting global :ack and sending global :commit for that point
- every node keeps track of points with global :commit
I have trouble understanding the ack and commit mechanism after reading the reference implementation over and over. So, I made up my own.
The algorithm is as follows:
First, every participant keeps track of a list of current peers and the last state received from them.
structure Frontier [PrunableSemilatticeSup α] (α : Type u) where
current_peers: Set PeerId
current_acks: PeerId -> α
Pruning can happen based on current_acks
. preserve_morphology
ensures that prune
don’t discard necessary information to use the pruned semilattice type later.
Communication happens over unreliable packet exchange network like UDP/IP.
Every one second (or some other duration), every participant broadcast its copy of Frontier
(each inside one packet) to its neighbours. This is the heartbeat that ensures liveliness.
netjoin is handled similarly to “welcome” in antimatter. Basically, a new participant Z
asks an existing participant A
for a copy of the state. A
add Z
to current_peers
and set its ack to be the ack of any existing peer. Then, A
replies with the modified copy to Z
. (potential impl footgun: this operation should be atomic and not happen interleaved with heatbeat read.)
netsplit is handled automatically due to how UDP works.
Forgetting about peers unseen after a certain duration
If timeout is implemented (ttl: PeerId -> Timestamp
), then there are edge cases where some participants forget early or late due to clock skew.
The netjoin process should be modified to deal with this case. I don’t know how to solve this case, especially when unreliable transmission and timeout are involved.
Maybe the three-wave-ack can be used here?
Forgetting saves memory. Will you remember friends you have lost contact with forever? This was a problem in Scuttlebutt, where disk space grows unbounded. PZP claims to have solved this issue.
When reconnecting after being forgotten, here are some choices:
- Forget about edits since the split.
- Use conventional merge methods on the presentation type. e.g.
git merge-file -p current /dev/null other
for text. (not relying on the properties of semi-lattice)
There is no rebase since history is pruned away.
Proof and PoC
None yet. I think I broke Lean. Here’s the WIP source code, if you are interested.
Musings
Curious, I looked up “morphism” on Wikipedia and discovered that this is partial, local morphism. I didn’t expect morphism to appear here.
Now you’re thinking with lattices.
Appendix – Proper Replace in Sync9
locations are like version numbers. There can be 3.0 and 3.1.
If you type “abcd”, then select and delete “bc”, the cursor is between “a” and “d” but at the location 1.1.
If you type “ad” and put the cursor in the middle, the cursor is at the location 1.0.
Then, when you insert some text,
If your conflict-free data type loses information when merging, you should be concerned.
Feedback from braidjs
One author of braidjs has provided detailed feedback for this article.
What’s next
My experience so far with different proof languages:
Idris2 | CC | it works! |
Lean4 | CC + Type u | can’t handle self-reference |
Kind | CC + Self Type | is WIP project |
Agda | CC | don’t know how to use |
Coq | CiC | don’t know how to use |
So, proof in Idris2 is next! OK. Trying to outside the books is harder than I thought. The plan to prove this is paused indefinitely.
Now, I’m working on the collaborative snaking solution. WIP source code here: https://codeberg.org/iacore/semilattice-snake
More Math!
As it turns out, a Time Machine history correspond to a edit path and a point on the semi-lattice. Merging is history dependent! Just knowing the current edit content will not help.
I understand the time machine to be something like this:
class TimeMachine History Delta State where
update : History -> Delta -> History
view: History -> State
-- there are many possible ways of merging without knowing the history
apply : State -> Delta -> State -> Prop
consistent : (c : History) -> (d : Delta) -> apply (view c) d (view (update c d))