Redesigning Circuitverse's Circuit Data Format

Overview

Right now CircuitVerse saves circuits as JSON blobs and the structure depends entirely on the order you happen to place elements. The backUp() function in backupCircuit.js just iterates over scope.allNodes and each entry in moduleList using Array.map(extract) which calls saveObject() on every element. Since allNodes is populated by insertion order (each new Node() call pushes via this.parent.scope.allNodes.push(this) in node.js) two logically identical circuits built in different sequences end up producing different JSON.

My plan is to replace that with a deterministic, schema-validated structure that separates logical connectivity from visual metadata. This would make diffing, verilog generation and machine analysis actually reliable.

Deliverables

A complete JSON Schema (draft-2020-12) defining the v2 circuit format
A canonicalization algorithm producing deterministic output via content-hash IDs
A serializer/deserializer pair with full round-trip fidelity
A backward-compatible migration tool handling all legacy circuits
Backend integration with schema validation at the API boundary
A batch migration rake task for existing database records

The Save Pipeline

First let’s go over the save pipeline once to see how circuits are actually saved and stored and to understand where things go wrong.

Layer 1: The Entry Point

When a user clicks “Save” the save() function calls generateSaveData(). Here is what it does step by step -

Creates an empty data object
Sets top-level metadata: name, timePeriod, clockEnabled, projectId, focussedCircuit, orderedTabs
Creates data.scopes = [] to hold each circuit tab
Builds a dependencyList by iterating scopeList (a global object keyed by circuit ID) and calling scope.getDependencies() on each scope
Defines a recursive saveScope(id) function that saves dependencies first, then calls backUp(scopeList[id]) and pushes the result to data.scopes
Iterates all scopes with for (let id in scopeList) saveScope(id)
Calls JSON.stringify(data) and returns the string

The scopeList object is a plain JavaScript object. The for…in loop iterates its keys in property insertion order which is the order circuits were created during the session. If a user creates a circuit in a certain order and if another user creates them in reverse order the scopes appear reversed. The output JSON differs even though the circuits are identical.

This is the first source of the non-determinism problem where scope ordering depends on session history not on the circuit’s content.

Layer 2: The Per-Scope Backup

The backUp(scope) function generates the JSON for a single circuit tab (called a “scope” in CircuitVerse).

data.allNodes = scope.allNodes.map(extract);

for (let i = 0; i < moduleList.length; i++) {
  if (scope[moduleList[i]].length) {
      data[moduleList[i]] = scope[moduleList[i]].map(extract);
  }
}

Two things happen here.

First allNodes is serialized in array order. The scope.allNodes array is a flat list of every connection point (called a “node”) in the circuit. It is populated in the Node constructor this.parent.scope.allNodes.push(this). Every time a user places a component that component’s nodes get pushed onto this array. The order is determined entirely by when the user happened to create each component.

Second module elements are serialized in array order. Each scope[moduleList[i]] array (e.g. scope.AndGate, scope.Input) is populated by baseSetup():

baseSetup() {
  this.scope[this.objectType].push(this);
}

Again insertion order.

Both the node list and the component lists are ordered by creation time. Two users building the exact same circuit in different order will get different arrays. Since connections are stored as indices into allNodes (as we will see in Layer 3) the connection data changes too.

Layer 3: Element Serialization

Every circuit element’s saveObject() method returns -

{
  x: this.x,                      // Canvas x position
  y: this.y,                      // Canvas y position
  objectType: this.objectType,    // Type name like "AndGate"
  label: this.label,
  direction: this.direction,
  labelDirection: this.labelDirection,
  propagationDelay: this.propagationDelay,
  customData: this.customSave(),  // Type-specific data
}

Anatomy of saveObject showing mixed visual and logical fields

Each element type overrides customSave() to return its specific data. For example AndGate.customSave() returns:

{
  constructorParamaters: [this.direction, this.inputSize, this.bitWidth],
  nodes: {
      inp: this.inp.map(findNode),    // Array of indices into allNodes
      output1: findNode(this.output1) // Single index into allNodes
  },
}

The findNode(x) function returns x.scope.allNodes.indexOf(x) which is the index of the node in the allNodes array.

The first problem is that the Visual and logical data are mixed. The x, y, direction fields (visual information about where the component is drawn) live in the same object as propagationDelay and customData.nodes (logical information about how the circuit works). There is no separation. If a user moves a component without changing any connections the entire object changes in a JSON diff.

Visual and logical data mixed vs separated

The second problem is that Connections use fragile integer indices. The findNode function converts live node references into integers by calling indexOf on the allNodes array. A connection like “AND gate input 0 is connected to Input A’s output” gets stored as something like “inp”: [2, 3] meaning “the nodes at positions 2 and 3 in allNodes.” These integers are meaningless outside the context of the specific allNodes array that produced them.

Layer 4: Node Serialization

Each node’s saveObject() returns:

{
  x: this.leftx,        // Position relative to parent L
  y: this.lefty,
  type: this.type,      // 0 = INPUT, 1 = OUTPUT, 2 = INTERMEDIATE
  bitWidth: this.bitWidth,
  label: this.label,
  connections: []       // Array of indices into allNodes
}

The connections array is populated by: findNode(this.connections[i]) which again stores indices into allNodes.

How the connection model works with allNodes indices

This is where the connection model becomes fully visible. Connections between nodes are stored as bidirectional integer index pairs. If allNodes[5] is connected to allNodes[12] then allNodes[5].connections = [12] and allNodes[12].connections = [5]. The integers 5 and 12 only have meaning relative to the current ordering of allNodes. If a component is added earlier in the creation sequence every index shifts.

Non-Determinism Problem

Consider a simple AND gate circuit consisting two inputs (A, B), one AND gate and one output (Y).

Scenario 1: User creates Input A then Input B then the AND gate then Output Y then wires them.

The allNodes array will look like:

[0] Input A's output node
[1] Input B's output node
[2] AND gate input node 0
[3] AND gate input node 1
[4] AND gate output node
[5] Output Y's input node

{ nodes: { inp: [2, 3], output1: 4 } }

The AND gate’s customData would look like

Scenario 2: User creates Output Y first then the AND gate then Input B then Input A.

The allNodes array now looks completely different:

[0] Output Y's input node
[1] AND gate input node 0
[2] AND gate input node 1
[3] AND gate output node
[4] Input B's output node
[5] Input A's output node

The AND gate’s customData is now:

{ nodes: { inp: [1, 2], output1: 3 } }

Same circuit. Different JSON. Every single connection index is different.

Same circuit built in two different orders producing different JSON

The specific line responsible is this.parent.scope.allNodes.push(this). This single line (a push onto an array) is the root cause of the non-determinism.

How Professional EDA Tools Handle This ?

This is fundamentally different from how professional EDA tools work. In SPICE connections are described by named nodes: R1 node_A node_B 1k means “resistor R1 connects node_A to node_B.” The names do not change when you reorder the file. In structural Verilog connections use named port associations : .input_a(wire_x) means “connect port input_a to wire_x.” Again order-independent.

Comparison of connection models across EDA tools

CircuitVerse uses neither names nor ports. It uses raw array positions. That is the core problem.

Downstream Impact

The non-determinism does not stay contained within the save format. It ripples outward into every system that consumes the saved data.

Verilog export: The DFS traversal and wire naming also depends on the creation order so the same circuit produces different Verilog depending on build order.

Version control: Running git diff on .cv files produces unreadable diffs because the JSON shifts around based on creation order. You cannot meaningfully track changes to a circuit over time.

ML/LLM analysis: Non-deterministic representations break training data consistency and embedding stability. If the same circuit produces different JSON each time your training data is inconsistent.

Circuit comparison: There is no reliable way to check if two files represent the same circuit. You would need to parse the full connection graph and compare topologies which is exactly what the format should make trivial.

Additional Findings

While going through the codebase I found some additional edge things that we have keep in mind during making the new format and the migration tool .

Legacy Type Names

The loadModule() has hardcoded compatibility mappings -

function rectifyObjectType(obj) {
  const rectify = {
      FlipFlop: 'DflipFlop',
      Ram: 'Rom',
  };
  return rectify[obj] || obj;
}

Old .cv files use FlipFlop and Ram as type names. These were renamed maybe at some point but old files were never migrated. Instead the loader translates them.

SubCircuit Version 1.0 vs 2.0

The SubCircuit constructor handles a version field -

if (this.version == "1.0") {
  this.version = "2.0";
  this.x -= subcircuitScope.layout.width / 2;
  this.y -= subcircuitScope.layout.height / 2;
  // ... Adjusts node positions
}

Version 1.0 subcircuits use center-referenced coordinates while version 2.0 uses corner-referenced.

The removeBugNodes Function

function removeBugNodes(scope = globalScope) {
  let x = scope.allNodes.length;
  for (let i = 0; i < x; i++) {
      if (scope.allNodes[i].type !== 2 &&
          scope.allNodes[i].parent.objectType === 'CircuitElement') {
          scope.allNodes[i].delete();
      }
  }
}

This function exists because I think the loader can produce orphaned nodes (nodes that belong to the base CircuitElement class rather than a specific component). The current save/load pipeline may have some known bugs that could produce malformed data. There is no validation to prevent this.

Sometimes I think when loading a circuit the nodes could end up getting attached to the wrong parent (the generic “CircuitElement” instead of their actual component like “AndGate” or “Input”) and then the loader just runs this cleanup function after loading to delete the mess.

Random Scope IDs

Scope IDs are generated as: Math.floor((Math.random() * 100000000000) + 1). They are random not deterministic. Even the identity of a circuit tab is non-deterministic. The same project created twice may get different scope IDs.

Every circuit tab (called a “scope”) gets a random ID number when created. Save your circuit, close the browser, come back later and create the exact same circuit again and you’ll get a different ID. This makes it impossible to compare two versions of the “same” circuit because there’s no stable identifier.

No Schema Validation Anywhere

The Rails backend stores circuit data as a plain project_data.data text column with no validation:

# simulator_controller.rb:66
def get_data
  render json: ProjectDatum.find_by(project: @project)&.data
end

The server accepts anything. Malformed JSON is accepted and stored. If a browser bug or a manual edit produces invalid data it goes into the database and it may crash the simulator when loaded.

The constructorParamaters Typo

The field is misspelled as constructorParamaters (missing an ‘e’) throughout the codebase and in every module’s customSave().

Backward Compatibility

I think the hardest part of this project is to maintain backward compatibility with the loader.

The loadScope function makes deep assumptions -

It expects data.allNodes to be an ordered array where indices match connection references
It expects module arrays (like data.AndGate) where node references are indices into allNodes
It calls constructNodeConnections() which uses scope.allNodes[data.connections[i]] for direct index lookup
It calls replace(obj[node], n[i]) where n[i] is an allNodes index

The loader cannot be replaced in a single step. It needs to support both the old format and the new format simultaneously during a transition period. I think we can use the “expand-contract” pattern here first expand (support both formats) then migrate data then contract (remove old format support).

Design Principles

I built the new format around four principles. Each one targets a specific problem found in the current setup.

1. Deterministic Ordering

Components are sorted by (type, label, content-hash) and nets are sorted by canonicalized endpoint lists. Same circuit will always produces byte-identical JSON. This follows W3C RDF Dataset Canonicalization (Recommendation, May 2024) and RFC 8785 (JSON Canonicalization Scheme). LibrePCB’s file format applies the same principle to EDA specifically “for any imaginable data set there exists only one exact representation.”

2. Logical-Visual Separation

The topology section (connectivity) is separate from the visual section (positions). Moving a component will only change its visual data. Inspired by EDIF’s multi-view architecture where the same circuit can have a NETLIST view (pure topology) and a SCHEMATIC view (with visual data).

3. Named References

Content-hash IDs (c_abc123, n_def456) will replace fragile array indices. Same principle as Git’s content-addressable filesystem. The identifier reflects what the thing is not when it was created.

4. Schema-Validated

JSON Schema draft-2020-12 with additionalProperties: false. Both frontend (Ajv) and backend (json_schemer) validate before storage. Invalid data is rejected at the boundary not cleaned up after the fact.

The New Format

Each scope will be divided into three sections topology (deterministic), visual (position-dependent) and metadata (auxiliary).

New format structure with three sections per scope

{
  "formatVersion": "2.0.0",
  "meta": { "name": "...", "projectId": "..." },
  "scopes": [{
      "id": "scope_main",
      "name": "Main",
      "topology": {
          "components": [
              {
                  "cid": "c_inp_a_7f3a2b",
                  "type": "Input",
                  "label": "A",
                  "ports": { "output1": "n_a_out" },
                  "parameters": { "bitWidth": 1 }
              }
          ],
          "nets": [
              {
                  "netId": "n_a_out",
                  "bitWidth": 1,
                  "endpoints": [
                      { "componentId": "c_inp_a_7f3a2b", "portName": "output1" },
                      { "componentId": "c_and_9d5e2f", "portName": "inp" }
                  ]
              }
          ]
      },
      "visual": {
          "componentPositions": {
              "c_inp_a_7f3a2b": { "x": 460, "y": 300 }
          }
      }
  }]
}

The biggest change is that I am getting rid of the allNodes array entirely. Connections use named net IDs instead of fragile index-based links. Visual data is fully separated so moving a component only touches the visual section and leaves topology alone. Components get sorted canonically. Annotations go into visual.annotations. Per-element subcircuitMetadata moves into the visual section. Project-level state like focussedCircuit and orderedTabs stays in the root meta object.

Fragile Indices vs Named Nets

The single most important change in the new format is replacing integer indices with named net references.

Fragile integer indices vs named net references side by side

In the current format a connection like “AND gate input 0 is connected to Input A’s output” is stored as “inp”: [2] where 2 is the index of Input A’s output node in the allNodes array. In the new format the same connection is stored as “inp”: [“n_a_out”] where n_a_out is a content-hash identifier for the net.

The index 2 is fragile. It changes whenever allNodes is reordered. The name n_a_out is stable. It is derived from the net’s content (which ports it connects) not from array position.

Full Before and After

Here is the complete v1 JSON for a simple 2-input AND gate circuit followed by the equivalent v2 JSON.

Before (v1) and After (v2) format comparison

Current Format (v1)

{
  "name": "Simple AND",
  "timePeriod": 500,
  "clockEnabled": true,
  "projectId": "abc123",
  "focussedCircuit": 11597572508,
  "orderedTabs": ["11597572508"],
  "scopes": [{
      "layout": {
          "width": 100, "height": 40,
          "title_x": 50, "title_y": 13, "titleEnabled": true
      },
      "verilogMetadata": {
          "isVerilogCircuit": false,
          "isMainCircuit": false,
          "code": "// ...",
          "subCircuitScopeIds": []
      },
      "allNodes": [
          { "x": 10, "y": 0, "type": 1, "bitWidth": 1, "label": "",
            "connections": [5] },
          { "x": 10, "y": 0, "type": 1, "bitWidth": 1, "label": "",
            "connections": [6] },
          { "x": -10, "y": -10, "type": 0, "bitWidth": 1, "label": "",
            "connections": [5] },
          { "x": -10, "y": 10, "type": 0, "bitWidth": 1, "label": "",
            "connections": [6] },
          { "x": 20, "y": 0, "type": 1, "bitWidth": 1, "label": "",
            "connections": [7] },
          { "x": 0, "y": 0, "type": 2, "bitWidth": 1, "label": "",
            "connections": [0, 2] },
          { "x": 0, "y": 0, "type": 2, "bitWidth": 1, "label": "",
            "connections": [1, 3] },
          { "x": 10, "y": 0, "type": 0, "bitWidth": 1, "label": "",
            "connections": [4] }
      ],
      "id": 11597572508,
      "name": "Main",
      "Input": [
          {
              "x": 460, "y": 300,
              "objectType": "Input", "label": "A",
              "direction": "RIGHT", "labelDirection": "LEFT",
              "propagationDelay": 0,
              "customData": {
                  "nodes": { "output1": 0 },
                  "values": { "state": 0 },
                  "constructorParamaters": ["RIGHT", 1,
                      {"x":0,"y":20,"id":"pid1"}]
              }
          },
          {
              "x": 460, "y": 360,
              "objectType": "Input", "label": "B",
              "direction": "RIGHT", "labelDirection": "LEFT",
              "propagationDelay": 0,
              "customData": {
                  "nodes": { "output1": 1 },
                  "values": { "state": 0 },
                  "constructorParamaters": ["RIGHT", 1,
                      {"x":0,"y":40,"id":"pid2"}]
              }
          }
      ],
      "Output": [
          {
              "x": 640, "y": 330,
              "objectType": "Output", "label": "Y",
              "direction": "LEFT", "labelDirection": "RIGHT",
              "propagationDelay": 0,
              "customData": {
                  "nodes": { "inp1": 7 },
                  "constructorParamaters": ["LEFT", 1,
                      {"x":100,"y":20,"id":"pid3"}]
              }
          }
      ],
      "AndGate": [
          {
              "x": 580, "y": 330,
              "objectType": "AndGate", "label": "",
              "direction": "RIGHT", "labelDirection": "LEFT",
              "propagationDelay": 10,
              "customData": {
                  "constructorParamaters": ["RIGHT", 2, 1],
                  "nodes": { "inp": [2, 3], "output1": 4 }
              }
          }
      ],
      "restrictedCircuitElementsUsed": [],
      "nodes": [5, 6]
  }]
}

What is wrong with this: allNodes[0].connections = [5] means “node 0 is connected to node 5.” But who is node 0 ? You have to count it is Input A’s output. Who is node 5 ? Count again it is an intermediate wire node. The numbers are meaningless without the array context.

New Format (v2)

{
  "formatVersion": "2.0.0",
  "meta": {
      "name": "Simple AND",
      "projectId": "abc123"
  },
  "globalState": {
      "timePeriod": 500,
      "clockEnabled": true,
      "focussedCircuit": "scope_main"
  },
  "orderedTabs": ["scope_main"],
  "scopes": [{
      "id": "scope_main",
      "name": "Main",
      "topology": {
          "components": [
              {
                  "cid": "c_inp_a_7f3a2b",
                  "type": "Input",
                  "label": "A",
                  "ports": { "output1": "n_a_out" },
                  "parameters": {
                      "bitWidth": 1, "direction": "RIGHT", "state": 0
                  },
                  "propagationDelay": 0
              },
              {
                  "cid": "c_inp_b_8e4c1d",
                  "type": "Input",
                  "label": "B",
                  "ports": { "output1": "n_b_out" },
                  "parameters": {
                      "bitWidth": 1, "direction": "RIGHT", "state": 0
                  },
                  "propagationDelay": 0
              },
              {
                  "cid": "c_and_9d5e2f",
                  "type": "AndGate",
                  "label": "",
                  "ports": {
                      "inp": ["n_a_out", "n_b_out"],
                      "output1": "n_and_out"
                  },
                  "parameters": {
                      "bitWidth": 1, "inputSize": 2, "direction": "RIGHT"
                  },
                  "propagationDelay": 10
              },
              {
                  "cid": "c_out_y_af6b3c",
                  "type": "Output",
                  "label": "Y",
                  "ports": { "inp1": "n_and_out" },
                  "parameters": {
                      "bitWidth": 1, "direction": "LEFT"
                  },
                  "propagationDelay": 0
              }
          ],
          "nets": [
              {
                  "netId": "n_a_out",
                  "bitWidth": 1,
                  "endpoints": [
                      { "componentId": "c_inp_a_7f3a2b",
                        "portName": "output1" },
                      { "componentId": "c_and_9d5e2f",
                        "portName": "inp", "portIndex": 0 }
                  ]
              },
              {
                  "netId": "n_b_out",
                  "bitWidth": 1,
                  "endpoints": [
                      { "componentId": "c_inp_b_8e4c1d",
                        "portName": "output1" },
                      { "componentId": "c_and_9d5e2f",
                        "portName": "inp", "portIndex": 1 }
                  ]
              },
              {
                  "netId": "n_and_out",
                  "bitWidth": 1,
                  "endpoints": [
                      { "componentId": "c_and_9d5e2f",
                        "portName": "output1" },
                      { "componentId": "c_out_y_af6b3c",
                        "portName": "inp1" }
                  ]
              }
          ]
      },
      "visual": {
          "layout": {
              "width": 100, "height": 40,
              "title_x": 50, "title_y": 13, "titleEnabled": true
          },
          "componentPositions": {
              "c_inp_a_7f3a2b": {
                  "x": 460, "y": 300,
                  "direction": "RIGHT", "labelDirection": "LEFT"
              },
              "c_inp_b_8e4c1d": {
                  "x": 460, "y": 360,
                  "direction": "RIGHT", "labelDirection": "LEFT"
              },
              "c_and_9d5e2f": {
                  "x": 580, "y": 330,
                  "direction": "RIGHT", "labelDirection": "LEFT"
              },
              "c_out_y_af6b3c": {
                  "x": 640, "y": 330,
                  "direction": "LEFT", "labelDirection": "RIGHT"
              }
          },
          "netRoutes": {}
      },
      "metadata": {
          "verilogMetadata": {
              "isVerilogCircuit": false,
              "isMainCircuit": false,
              "code": "// ...",
              "subCircuitScopeIds": []
          },
          "restrictedCircuitElementsUsed": []
      }
  }]
}

What changed:

No allNodes array. Connections use named net IDs instead of integer indices. “output1”: “n_a_out” is self-documenting. No counting required.
Visual data in separate visual section. Moving Input A from (460, 300) to (500, 300) changes only x value of the componentPositions [“c_inp_a_7f3a2b”]. The topology section is unchanged.
Components sorted canonically by type then label then hash. Regardless of creation order Input A always appears before Input B (alphabetical by label within same type).
Nets sorted canonically by sorted endpoint list. The net connecting Input A to the AND gate always appears first because its endpoint strings sort first.
Annotations are separate. In the old format Text and Rectangle elements are stored in the same module arrays as circuit elements. In the new format they live in visual.annotations.

Canonical ID Generation

Every component and net gets a deterministic ID derived from its content not its position or creation time.

Why Content-Hash IDs ?

I though of Three ID strategies -

Comparison of ID strategies: array index vs UUID vs content-hash

Content-hash IDs are derived from what the component is (type, parameters) rather than when it was created (array index) or a random assignment (UUID). This is the same principle used by Git (commits are SHA-1 hashes of their content), IPFS (content identifiers are hashes of data) and Docker (image layers are content-addressed).

Component IDs

cid = "c_" + SHA256(type + "|" + canonical_params)[:12]

Why SHA-256 ?

It is the most widely available cryptographic hash. The Web Crypto API (crypto.subtle.digest(‘SHA-256’, data)) provides a native browser implementation with no external library needed.

Why truncate to 12 hex characters (48 bits) ?

Full SHA-256 produces 64 hex characters which is excessive for an identifier. 12 characters provide 2^48 (approximately 281 trillion) possible values. For a circuit with 1000 components the birthday paradox gives a collision probability of approximately one in a billion.

Why the “c_” prefix ?

It makes IDs visually distinguishable from net IDs (“n_”). When reading a JSON file you can immediately tell whether an ID refers to a component or a net.

Why NOT include ports in the hash?

Including ports (net references) in the component hash creates a circular dependency: component IDs would depend on net IDs (via ports) and net IDs would depend on component IDs (via endpoints). With SHA-256 hashing this cycle can never converge because each pass produces entirely new IDs. Hashing only type + params makes component IDs independent of net IDs. Net IDs can then safely reference component IDs in a single pass with no convergence needed.

Collision resolution

If two components have identical type and parameters (e.g. two 1-bit AND gates with the same inputSize) they get the same base hash. A disambiguation suffix (_1, _2) is added in canonical order. This is deterministic: the first such component in canonical order gets the base hash and the second gets _1.

Net IDs

netId = "n_" + SHA256(sorted_endpoint_strings)[:12]

A “net” is a set of electrically connected points. In the current CircuitVerse format this concept does not exist explicitly. Connections are stored as pairwise links between individual nodes. In the new format all nodes connected by wires are grouped into a single net. This is the standard netlist representation used by SPICE, EDIF and Verilog.

Canonical Sort Order

Canonical sort with three levels of ordering

Components are sorted using a three-level key:

Primary: Component type (alphabetical). Groups related components together.
Secondary: Label (alphabetical). Puts labeled components in intuitive order (Input “A” before Input “B”).
Tertiary: Content-hash. Breaks ties for unlabeled components of the same type.

This produces a total ordering that is deterministic and semantically meaningful.

Async Hashing

crypto.subtle.digest() (Web Crypto API) returns a Promise. It is asynchronous. Since the current generateSaveData() pipeline is synchronous there are two options: (1) make the serializer async or (2) use a synchronous pure-JS SHA-256 library like js-sha256. Since the save pipeline already performs async network I/O after serialization I think going with the asynchronous path is the better decision.

The Complete JSON Schema

The schema uses JSON Schema draft-2020-12 with strict validation.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://circuitverse.org/schemas/circuit-v2.0.0.json",
  "title": "CircuitVerse Circuit Format v2.0.0",
  "description": "Structured, canonical format for CircuitVerse circuit data",
  "type": "object",
  "required": ["formatVersion", "meta", "scopes"],
  "additionalProperties": false,
  "properties": {
      "formatVersion": {
          "type": "string",
          "const": "2.0.0"
      },
      "meta": { "$ref": "#/$defs/ProjectMeta" },
      "scopes": {
          "type": "array",
          "items": { "$ref": "#/$defs/Scope" },
          "minItems": 1
      },
      "orderedTabs": {
          "type": "array",
          "items": { "type": "string" }
      },
      "globalState": { "$ref": "#/$defs/GlobalState" }
  },
  "$defs": {
      "ProjectMeta": {
          "type": "object",
          "required": ["name", "projectId"],
          "additionalProperties": false,
          "properties": {
              "name": { "type": "string", "minLength": 1 },
              "projectId": { "type": "string" },
              "createdAt": { "type": "string", "format": "date-time" },
              "lastModified": { "type": "string", "format": "date-time" },
              "description": { "type": "string" },
              "tags": {
                  "type": "array",
                  "items": { "type": "string" }
              }
          }
      },
      "GlobalState": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
              "timePeriod": { "type": "integer", "minimum": 1 },
              "clockEnabled": { "type": "boolean" },
              "focussedCircuit": { "type": "string" }
          }
      },
      "Scope": {
          "type": "object",
          "required": ["id", "name", "topology", "visual"],
          "additionalProperties": false,
          "properties": {
              "id": { "type": "string" },
              "name": { "type": "string", "minLength": 1 },
              "topology": { "$ref": "#/$defs/Topology" },
              "visual": { "$ref": "#/$defs/Visual" },
              "metadata": { "$ref": "#/$defs/ScopeMetadata" }
          }
      },
      "Topology": {
          "type": "object",
          "required": ["components", "nets"],
          "additionalProperties": false,
          "properties": {
              "components": {
                  "type": "array",
                  "items": { "$ref": "#/$defs/Component" }
              },
              "nets": {
                  "type": "array",
                  "items": { "$ref": "#/$defs/Net" }
              },
              "subcircuits": {
                  "type": "array",
                  "items": { "$ref": "#/$defs/SubCircuitRef" }
              }
          }
      },
      "Component": {
          "type": "object",
          "required": ["cid", "type", "ports"],
          "additionalProperties": false,
          "properties": {
              "cid": {
                  "type": "string",
                  "description": "Content-hash of (type, params)"
              },
              "type": {
                  "type": "string",
                  "description": "Component type e.g. AndGate, Input, Output"
              },
              "label": { "type": "string" },
              "ports": {
                  "type": "object",
                  "additionalProperties": {
                      "oneOf": [
                          { "$ref": "#/$defs/PortRef" },
                          {
                              "type": "array",
                              "items": { "$ref": "#/$defs/PortRef" }
                          }
                      ]
                  }
              },
              "parameters": { "$ref": "#/$defs/ComponentParameters" },
              "propagationDelay": { "type": "integer", "minimum": 0 }
          }
      },
      "PortRef": {
          "type": "string",
          "description": "Reference to a net ID. Format: n_<hash>"
      },
      "ComponentParameters": {
          "type": "object",
          "description": "Type-specific parameters",
          "properties": {
              "bitWidth": { "type": "integer", "minimum": 1, "maximum": 32 },
              "inputSize": { "type": "integer", "minimum": 2 },
              "direction": { "$ref": "#/$defs/Direction" },
              "state": { "type": "integer" },
              "constructorParameters": {
                  "type": "array"
              },
              "values": {
                  "type": "object"
              }
          },
          "additionalProperties": true
      },
      "Direction": {
          "type": "string",
          "enum": ["RIGHT", "LEFT", "UP", "DOWN"]
      },
      "Net": {
          "type": "object",
          "required": ["netId", "bitWidth", "endpoints"],
          "additionalProperties": false,
          "properties": {
              "netId": {
                  "type": "string",
                  "description": "Content-hash of sorted endpoint set"
              },
              "bitWidth": { "type": "integer", "minimum": 1, "maximum": 32 },
              "label": { "type": "string" },
              "endpoints": {
                  "type": "array",
                  "items": { "$ref": "#/$defs/Endpoint" },
                  "minItems": 1
              }
          }
      },
      "Endpoint": {
          "type": "object",
          "required": ["componentId", "portName"],
          "additionalProperties": false,
          "properties": {
              "componentId": { "type": "string" },
              "portName": { "type": "string" },
              "portIndex": { "type": "integer" }
          }
      },
      "SubCircuitRef": {
          "type": "object",
          "required": ["cid", "scopeId", "inputPorts", "outputPorts"],
          "additionalProperties": false,
          "properties": {
              "cid": { "type": "string" },
              "scopeId": { "type": "string" },
              "label": { "type": "string" },
              "version": { "type": "string" },
              "inputPorts": {
                  "type": "array",
                  "items": { "$ref": "#/$defs/PortRef" }
              },
              "outputPorts": {
                  "type": "array",
                  "items": { "$ref": "#/$defs/PortRef" }
              }
          }
      },
      "Visual": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
              "layout": { "$ref": "#/$defs/Layout" },
              "componentPositions": {
                  "type": "object",
                  "additionalProperties": { "$ref": "#/$defs/Position" }
              },
              "netRoutes": {
                  "type": "object",
                  "additionalProperties": { "$ref": "#/$defs/NetRoute" }
              },
              "annotations": {
                  "type": "array",
                  "items": { "$ref": "#/$defs/Annotation" }
              }
          }
      },
      "Position": {
          "type": "object",
          "required": ["x", "y"],
          "additionalProperties": false,
          "properties": {
              "x": { "type": "number" },
              "y": { "type": "number" },
              "direction": { "$ref": "#/$defs/Direction" },
              "labelDirection": { "$ref": "#/$defs/Direction" },
              "subcircuitMetadata": {
                  "type": "object",
                  "properties": {
                      "showInSubcircuit": { "type": "boolean" },
                      "showLabelInSubcircuit": { "type": "boolean" },
                      "labelDirection": { "$ref": "#/$defs/Direction" },
                      "x": { "type": "number" },
                      "y": { "type": "number" }
                  }
              }
          }
      },
      "NetRoute": {
          "type": "object",
          "properties": {
              "waypoints": {
                  "type": "array",
                  "items": {
                      "type": "object",
                      "required": ["x", "y"],
                      "properties": {
                          "x": { "type": "number" },
                          "y": { "type": "number" }
                      }
                  }
              }
          }
      },
      "Layout": {
          "type": "object",
          "properties": {
              "width": { "type": "number" },
              "height": { "type": "number" },
              "title_x": { "type": "number" },
              "title_y": { "type": "number" },
              "titleEnabled": { "type": "boolean" }
          }
      },
      "Annotation": {
          "type": "object",
          "required": ["type", "x", "y"],
          "properties": {
              "type": {
                  "type": "string",
                  "enum": ["Text", "Rectangle", "Arrow", "ImageAnnotation"]
              },
              "x": { "type": "number" },
              "y": { "type": "number" },
              "properties": {
                  "type": "object",
                  "additionalProperties": true
              }
          }
      },
      "ScopeMetadata": {
          "type": "object",
          "properties": {
              "verilogMetadata": {
                  "type": "object",
                  "properties": {
                      "isVerilogCircuit": { "type": "boolean" },
                      "isMainCircuit": { "type": "boolean" },
                      "code": { "type": "string" },
                      "subCircuitScopeIds": {
                          "type": "array",
                          "items": { "type": "string" }
                      }
                  }
              },
              "testbenchData": {
                  "type": "object",
                  "properties": {
                      "testData": {},
                      "currentGroup": { "type": "integer" },
                      "currentCase": { "type": "integer" }
                  }
              },
              "restrictedCircuitElementsUsed": {
                  "type": "array",
                  "items": { "type": "string" }
              }
          }
      }
  }
}

Why $defs and $ref?

The schema defines types like Component, Net and Position once in $defs then references them with $ref. DRY principle applied to schema design. JSON Schema 2020-12 uses $defs (replacing the older definitions keyword from draft-07).

Serialization Pipeline

The new serializer transforms the live circuit state into v2 JSON through three phases. The key insight is that component IDs are computed from hash(type + params) only (no port references) which makes them independent of net IDs. This breaks a circular dependency and enables a clean single-pass pipeline.

Phase 1: Component IDs (Union-Find + Hash)

Union-Find groups all electrically connected nodes into nets in nearly linear time: O(n * alpha(n)) where alpha is the inverse Ackermann function (effectively constant for all practical inputs).

Union-Find visualization: nodes merging into nets

Why Union-Find?

Because connections in CircuitVerse are stored as pairwise links (node A connects to node B, node B connects to node C). We need to group all transitively connected nodes into a single “net.” Union-Find does this optimally.

function buildNetMap(scope):
  uf = new UnionFind()
  for each node in scope.allNodes:
      for each connection in node.connections:
          uf.union(node, connection)

  groups = groupBy(scope.allNodes, node => uf.find(node))

  nets = []
  for each group in groups:
      endpoints = []
      for each node in group:
          if node is a component port (not intermediate):
              endpoints.push({
                  componentId: node.parent,
                  portName: reverseMapPort(node, node.parent),
                  portIndex: findPortIndex(node, node.parent)
              })
      if endpoints.length > 0:
          nets.push(new Net(endpoints, group[0].bitWidth))
  return nets

Why distinguish component ports from intermediate nodes?

Component port nodes (type 0 or 1) belong to a specific component and have a named port. These become net endpoints. Intermediate nodes (type 2) are junction points on wires. They have no logical meaning and exist only for wire routing. These become visual waypoints in netRoutes.

Why reverseMapPort?

Given a node and its parent component we need to determine which named port this node corresponds to. The reverse mapping works by calling customSave() on the parent and checking which port reference matches the node’s allNodes index. This reuses the existing serialization contract rather than adding a new method to every element type.

Union-Find groups all connected nodes into nets. Then for each element customSave() extracts parameters, and hash(type + params) generates a component cid. Components are sorted by (type, label, cid) and collision resolution adds _1, _2 suffixes for duplicates. Component IDs are now final and independent of net IDs.

Phase 2: Net IDs

Each net’s endpoint list is built using the final component cids from Phase 1, sorted lexicographically and content-hashed to produce a stable net identifier. Since component IDs are already final this is a single pass with no circular dependency. Nets are sorted by netId and collision resolution handles duplicates.

Phase 3: Assemble

Component ports are filled with the final net IDs from Phase 2 by mapping each port’s node through the nodeToNetId lookup. Position fields (x, y, direction) are moved into the separate visual section. Subcircuits, annotations and net route waypoints are extracted. The assembled JSON is checked against the v2 schema using Ajv before it is sent to the server.

The new save pipeline with trust boundary

The schema validation step creates a trust boundary. No invalid data passes through.

The Dual Load Pipeline

The loader needs to handle both v1 and v2 formats during the transition period. The format router checks for the formatVersion field.

function loadCircuit(data) {
  if (data.formatVersion && data.formatVersion.startsWith("2.")) {
      return loadV2(data);
  }
  return loadLegacy(data);
}

The loadLegacy() function is the current load() function unchanged. The loadV2() function reads from the new structure. The formatVersion field enables this branching.

The Migration Pipeline

The migration tool converts legacy circuit data through a multi-stage pipeline.

8-stage migration pipeline from v1 to v2

Batch Migration Rake Task

For the server database I will build a rake task:

namespace :circuit_data do
desc "Migrate all circuit data to v2 format"
task migrate: :environment do
  failed = []
  total = ProjectDatum.count
  migrated = 0

  ProjectDatum.find_each do |datum|
    begin
      data = JSON.parse(datum.data)
      next if data["formatVersion"]  # already migrated

      migrated_data = CircuitDataMigrator.migrate(data)
      datum.update!(data: migrated_data.to_json, format_version: "2.0.0")
      migrated += 1
      puts "Migrated #{migrated}/#{total}" if migrated % 100 == 0
    rescue => e
      failed << { id: datum.id, error: e.message }
    end
  end

  puts "Migration complete. #{migrated} migrated, #{failed.length} failed."
  failed.each { |f| puts "  ProjectDatum##{f[:id]}: #{f[:error]}" }
end
end

Three Migration Paths

Server database: The batch rake task iterates all ProjectDatum records and converts them in place
Browser localStorage: A client-side converter upgrades browser localStorage on next load. Old data is loaded by loadLegacy() and the next save writes it in v2 format
Exported .cv files: An offline tool for files that have been exported and live outside the system

Architecture Comparison

Aspect	Current (v1)	New (v2)
Connection model	Integer indices into allNodes	Named net IDs (n_abc123)
Component identity	Array position	Content-hash (c_abc123)
Ordering	Creation order (non-deterministic)	Canonical sort (deterministic)
Visual/Logical	Mixed in same object	Separated into topology + visual
Schema	None	JSON Schema draft-2020-12
Validation	None (server accepts anything)	Both client (Ajv) and server (json_schemer)
Migration support	Ad-hoc loader shims	Versioned format with router
Diff readability	Unusable	Meaningful structural diffs

Testing Strategy

Unit Tests

Schema validation tests: Valid v2 JSON passes. Missing required fields are caught. Invalid types are caught. Extra unknown fields are rejected by additionalProperties: false.

Net builder tests: Simple two-node connection produces one net with two endpoints. Star topology (one output driving three inputs) produces one net with four endpoints. Disconnected nodes produce separate nets.

Canonicalization tests: Same circuit with different creation order produces identical IDs. Different circuits produce different IDs. Collision resolution adds correct suffixes.

Property-Based Tests (Determinism)

Shuffle test: For 1000 iterations take a reference circuit fixture, shuffle the allNodes array randomly, shuffle each module array randomly, serialize using the new serializer and assert the output is byte-identical to reference output.

Idempotency test: For 100 iterations serialize a circuit, deserialize the result, serialize again and assert the second serialization is byte-identical to first.

Round-Trip Tests

For each test fixture (gates-circuitdata.json, Decoders-plexers-circuitdata.json, subCircuit-testdata.json):

Load legacy v1 data using loadLegacy()
Serialize to v2 using new serializer
Deserialize from v2 using new deserializer
For each Input element toggle all states
Run simulation for 100 cycles
Compare all Output values with reference

Backend Validation Tests

SimulatorController#create with valid v2 data succeeds. With invalid v2 data returns 422 Unprocessable Entity. With v1 data (no formatVersion) still accepted for backward compatibility.

Regression Tests

All existing tests must continue to pass data.spec.js, gates.spec.js, sequential.spec.js, subCircuit.spec.js, complexCircuit.spec.js and the Rails controller/API specs.

Risk Assessment

Risk	Likelihood	Impact	Mitigation
allNodes index dependency runs deeper than save/load	Medium	High	v2 deserializer reconstructs valid allNodes for simulation engine. Round-trip tests with simulation verification catch behavioral regressions.
63 element types have inconsistent customSave() contracts	High	Medium	Catalog every element’s customSave() during Week 3. Splitter, ROM, RAM, EEPROM and SubCircuit get dedicated mapping logic.
Backward compatibility breaks for users with localStorage data	Medium	Medium	Format router detects version on every load including localStorage. Old data loads via loadLegacy() and next save writes v2.
Content-hash ID collisions in large circuits	Low	Low	Collision resolution (suffix _1, _2) handles it. 48-bit hash gives ~1 in a billion collision probability for 1000 components.
Performance regression for large circuits	Medium	Medium	SHA-256 is microseconds. Union-Find is near-linear. Schema validation can be optional during autosave.
Verilog export depends on undocumented ordering assumptions	Medium	Low	Update Verilog export to use canonical component IDs. Add logical equivalence test comparing old vs new output.
SubCircuit circular dependencies during migration	Low	High	Preserve original scope IDs. Process scopes in dependency order (leaves first).

Closing Thoughts

The current format worked well for simplicity but it does not scale with the kind of features CircuitVerse needs now.

This redesign fixes those limitations by making the format deterministic, validated and easier to reason about. It removes fragile assumptions and makes the data model more structured and reliable.

Because of that anyone working on the project can build features like diffing, deterministic verilog export or llm analysis without any trouble. It also makes debugging, testing and extending the system much easier.

For users this means more reliable circuits, fewer bugs, consistent saves and better support for upcoming features without breaking existing work.

That is why this is my GSoC 2026 proposal for CircuitVerse.