Synchronize eventually with gossip and pruning semi-lattice

This article was originally planned to be a retelling of sync9 and antimatter. It quickly got out of hand.

Quick retelling of the original algorithms

This set of algorithms is battle tested, unlike the ones I will propose below.

The synchronization algorithm: Antimatter
The author emphasizes that this is not a consensus algorithm like Raft.

The CRDT library: Sync9

Most important concept (similar to prunable semi-lattice): OT + CRDT = Time Machine


Now, onto the me going off-the-book and do hand-wavey math section. The claim is actually weaker than the implicit claim by Antimatter+Sync9, but I ended up knowing about Kind (type checker with Self Type) along the way, so the adventure is worth it to me. If reading this is worth it to you I don’t know. Maybe stick to the above tested algorithms.

Semi-lattice

If you have a “up” and partial ordering for a type (α below), the type is semi-lattice.

Collaborative text editing is showcased in the sync9 demo. α is a list of edits like Insert('a'@0), Delete(0:1). Two lists of edits can merge (the “up”).

Pruning Semi-lattice

It’s like morphing semi-lattice, yet not morphism. It’s partial morphism, which I now call “pruning”. The definition:

import Mathlib.Order.Basic
import Mathlib.Order.Lattice

def larger [PartialOrder α] (a : α) (s : Set α) :=  b ∈ s, a ≥ b

class PrunableSemilatticeSup (α : Type u) extends SemilatticeSup α where
  PrunedType : Set α -> Type u
  -- PrunedType can be pruned further
  preserve_prune : (s : Set α) -> PrunableSemilatticeSup (PrunedType s)
  prune : (s : Set α) -> α -> Option (PrunedType s)
  -- nothing important is pruned away
  correct : (s : Set α) ->  a: α, larger a s → (prune s a).isSome = true
  -- preserve local morphology
  preserve_morphology :  a b: α, (l : larger a s ∧ larger b s) → prune s (a ⊔ b) = (prune s a).get (correct s a l.left) ⊔ (prune s b).get (correct s b l.right)

(The above code doesn’t compile. mentioned later.)

I was a bit surprised to find out that the underlying data type changes at runtime. This doesn’t happen much in practice. The role of a type system is to make invalid states unrepresentable. Here, what is considered invalid changes at runtime, so the type changes at runtime as well. In this case, even Zig can’t help me.

Consensus Algorithm

Can we use raft here? Why Antimatter?

Raft:

  • the total number of nodes is fixed
  • has one leader
  • uses log. allows arbitrary data type

sync9:

  • the total number of nodes is unknown
  • the author of every point is responsible collecting global :ack and sending global :commit for that point
  • every node keeps track of points with global :commit

I have trouble understanding the ack and commit mechanism after reading the reference implementation over and over. So, I made up my own.

The algorithm is as follows:

First, every participant keeps track of a list of current peers and the last state received from them.

structure Frontier [PrunableSemilatticeSup α] (α : Type u) where
  current_peers: Set PeerId
  current_acks: PeerId -> α

Pruning can happen based on current_acks. preserve_morphology ensures that prune don’t discard necessary information to use the pruned semilattice type later.

Communication happens over unreliable packet exchange network like UDP/IP.

Every one second (or some other duration), every participant broadcast its copy of Frontier (each inside one packet) to its neighbours. This is the heartbeat that ensures liveliness.

netjoin is handled similarly to “welcome” in antimatter. Basically, a new participant Z asks an existing participant A for a copy of the state. A add Z to current_peers and set its ack to be the ack of any existing peer. Then, A replies with the modified copy to Z. (potential impl footgun: this operation should be atomic and not happen interleaved with heatbeat read.)

netsplit is handled automatically due to how UDP works.

Forgetting about peers unseen after a certain duration

If timeout is implemented (ttl: PeerId -> Timestamp), then there are edge cases where some participants forget early or late due to clock skew.

The netjoin process should be modified to deal with this case. I don’t know how to solve this case, especially when unreliable transmission and timeout are involved.

Maybe the three-wave-ack can be used here?


Forgetting saves memory. Will you remember friends you have lost contact with forever? This was a problem in Scuttlebutt, where disk space grows unbounded. PZP claims to have solved this issue.

When reconnecting after being forgotten, here are some choices:

  1. Forget about edits since the split.
  2. Use conventional merge methods on the presentation type. e.g. git merge-file -p current /dev/null other for text. (not relying on the properties of semi-lattice)

There is no rebase since history is pruned away.

Proof and PoC

None yet. I think I broke Lean. Here’s the WIP source code, if you are interested.

Musings

Curious, I looked up “morphism” on Wikipedia and discovered that this is partial, local morphism. I didn’t expect morphism to appear here.

Now you’re thinking with lattices.

Appendix – Proper Replace in Sync9

locations are like version numbers. There can be 3.0 and 3.1.

If you type “abcd”, then select and delete “bc”, the cursor is between “a” and “d” but at the location 1.1.

If you type “ad” and put the cursor in the middle, the cursor is at the location 1.0.

Then, when you insert some text,

If your conflict-free data type loses information when merging, you should be concerned.

Feedback from braidjs

One author of braidjs has provided detailed feedback for this article.

What’s next

My experience so far with different proof languages:

Idris2CCit works!
Lean4CC + Type ucan’t handle self-reference
KindCC + Self Typeis WIP project
AgdaCCdon’t know how to use
CoqCiCdon’t know how to use

So, proof in Idris2 is next! OK. Trying to outside the books is harder than I thought. The plan to prove this is paused indefinitely.

Now, I’m working on the collaborative snaking solution. WIP source code here: https://codeberg.org/iacore/semilattice-snake

More Math!

As it turns out, a Time Machine history correspond to a edit path and a point on the semi-lattice. Merging is history dependent! Just knowing the current edit content will not help.

I understand the time machine to be something like this:

class TimeMachine History Delta State where
  update : History -> Delta -> History
  view: History -> State
  -- there are many possible ways of merging without knowing the history
  apply : State -> Delta -> State -> Prop
  consistent : (c : History) -> (d : Delta) -> apply (view c) d (view (update c d))