Networked tuple set with authenticated elements

Today, I was thinking of how Matrix and ActivityPub are way too complex for what they do – graph data replication. So I thought, why can’t I make something as simple as possible – the reason of me writing this article.

The data model introduced below should be paired with network protocol like 9P that allows different computers to link together and write to the same graph.

Data model

Not a graph. A set of tuples.

(AAAAC3NzaC1lZDI1NT... "name" "hi")
(AAAAC3NzaC1lZDI1NT... "pub" gNaysxtgtM/cY1i4ZQ...)
(gNaysxtgtM/cY1i4ZQ... "key" ...)

Each tuple element can be either

  • authenticated: a public key
  • unauthenticated: string literal

Each tuple is signed with the private key of each authenticated element.

It’s kind of like in mythology, where you need to know the true name of some entity in order to call them.

Example application: federated blog

Let’s say you have a blog.

($USER "name" "...")
($USER "posted" $POST_A)
($POST_A "content" "...")
($POST_A "reply_to" $POST_A_PUB)
($POST_A_PUB "key" "...")

You, as the author, can link to any element in this graph.

Visitors to your blog, however, can only link to $POST_A_PUB, since its private key is public. compute compute compute replication control

If not for spam and resource limitation, we can sync everything.

access control: what can be read

replication control: what do i want to read

PoW

Signing with a private key can have adjustable difficulty, so that some elements are harder to link to.

whitelist + web of trust

Add a zone in front of every tuple,

# zone metadata
($ZONE "trust" "...")

# zone data
($ZONE $USER "name" "...")

Then, you can manage trust based on zones, with a model like PGP’s trust model.

Synchronization

RPC is usually implemented as message passing / request+response in other protocols.

If our protocol uses message passing, we need to take care of resend.

What if we don’t? The graph synchronization protocol already takes care of state. Once we make it efficient for RPC, it should be efficient for everything else.

Here’s an example usage of our protocol as RPC, with two network hosts A and B.

compute
compute
# A creates the endpoint
($ZONE_A "owner" "IP_ADDR_A")
($ZONE_A $A "pub" $A_PUB)
($ZONE_A "listen" $A_PUB)
($ZONE_A $A_PUB "key" "...")

# B creates a request
($ZONE_B $A_PUB "request" $REQ "payload")
($ZONE_B $REQ "key" "...")
($ZONE_B "listen" $REQ)

# A responds
($ZONE_A $A "reply" $REQ "payload")

The zones here are not only for whitelisting. They are used for pub/sub as well. It is kind of like MAC addresses.

  1. (compute) A creates the endpoint
  2. (network) B connects to A
  3. (network) B asks for all tuples in A
  4. (compute) B sees that A owns $ZONE_A (by IP), and is interested in new incoming links to $A_PUB
  5. (compute) B creates a request
  6. (network) B notifies A for the link to $A_PUB ($ZONE_B "request" $REQ $A_PUB "payload")
  7. (network) A asks for tuples containing $REQ
  8. (compute) A processes the request and creates a response to it
  9. … you get the point

It is not the best you can do with capnproto. I think it’s good enough.

The meaning of keywords (“owner”, “listen”) are defined by the library user, like 9P or HTTP.

Additional words

To be done: implementing the protocol

Q: Why not RDF?
A: Inefficient.

Q: Why not capnproto?
A: It cannot be extended by multiple parties independently. We need a protocol that can evolve.