CRDT Internals & Troubleshooting
How the CRDT DAG synchronization works internally, common log messages, and how to troubleshoot them.
DAG Synchronization
OpenTela's CRDT datastore replicates state by exchanging Merkle-DAG nodes over libp2p. When a peer writes to the datastore, it broadcasts the new DAG head (a CID) via PubSub. Other peers receive the broadcast and attempt to fetch the corresponding block data through the DAG syncer (bitswap).
The fetch is subject to a configurable timeout (DAGSyncerTimeout, default 5 minutes). If the block cannot be retrieved within that window, the head is skipped. This is safe — the CRDT will converge on later syncs.
Head Processing Errors
What it looks like
At debug log level you may see:
DEBUG go-ds-crdt/crdt.go error processing new head: error getting root delta priority: context deadline exceededThis means the node received a broadcast announcing a new CRDT head but could not fetch the block data from any peer before the timeout expired.
Common causes
| Cause | Description |
|---|---|
| Peer departed | The broadcasting peer went offline between sending the gossip message and the fetcher requesting the block. |
| NAT / firewall | The node can receive PubSub gossip but cannot establish a direct bitswap connection to the block holder (check relay and hole-punching). |
| Small / unstable network | In a small cluster, if the only peer holding a block disconnects, the block becomes temporarily unfetchable. |
| Overloaded peer | The peer holding the block is too slow to respond within the timeout. |
Rate-limited warning
Individual head processing errors are logged at debug level to avoid log noise. If the error rate exceeds 100 occurrences per minute, a single warning is emitted:
WARN go-ds-crdt/crdt.go high rate of head processing errors: 142 in the last minuteThis warning indicates a sustained connectivity problem. Investigate:
- Network health — Are peers reachable? Check libp2p connection counts and relay status.
- Peer churn — Are nodes joining and leaving too frequently? Review the tombstone mechanism for how departures are handled.
- Resource pressure — Is the node under CPU or memory pressure, causing slow block processing?
Adjusting log level
To see individual head processing errors, set loglevel: debug in your config:
loglevel: debugOr via environment variable: OF_LOGLEVEL=debug.