AI Scene Composition Pipeline

The AI pipeline is Yugma's core differentiator. It transforms natural language into structured 3D scenes through a multi-stage architecture.

Data Flow

User: "Build a sci-fi room with neon lighting"
  │
  ▼
[1] Reference Resolver (client)
    "that" → selected obj, "the red sphere" → id
  │
  ▼
[2] Spatial Preprocessor (client)
    "6 cubes in a circle" → precomputed [x,y,z] positions
  │
  ▼
[3] Style Fingerprint (client)
    Analyzes scene palette → "[STYLE] industrial metallic..."
  │
  ▼
[4] YSL Serializer (client)
    Scene → compact text at ~45 tokens/object
  │
  ▼
[5] Cloud Function: aiCompose (server)
    ├── Planning pass (complex requests only)
    │   └── Claude outputs ordered plan (text only)
    ├── Executor pass (agentic loop, up to 8 iterations)
    │   ├── Claude calls tools → SimScene executes them
    │   ├── Tool results (incl. created IDs) sent back
    │   └── Claude sees results → calls more tools or stops
    └── Firestore session persistence (non-blocking)
  │
  ▼
[6] Tool Dispatch (client)
    Each tool call → Zustand store action → Three.js re-renders

Stage 1: Reference Resolution

File: src/utils/referenceResolver.ts

Resolves natural language references to scene object IDs before sending to the AI:

"that" / "it" → currently selected object
"the red sphere" → search by type + color
"everything on the left" → filter by position

Stage 2: Spatial Preprocessor

File: src/utils/spatialPreprocessor.ts

LLMs hallucinate trigonometry. The preprocessor detects arrangement patterns and computes exact positions client-side:

Pattern	Trigger words	Computation
Circle	"in a circle", "ring", "circular"	`x = rcos(θ), z = rsin(θ)`
Grid	"in a grid", "3x3", "matrix"	Row/col with centered offset
Stack	"stacked", "tower", "pile"	Y increments
Spiral	"spiral", "helix"	Circular path + Y ramp
Line	"in a row", "line of"	X spacing, centered
Scatter	"scattered", "randomly placed"	Deterministic pseudo-random (LCG)

Positions are injected as a [SPATIAL_PREPROCESSOR] hint in the message — visible to AI but not to the user's chat history.

Handles NxM grid parsing (e.g., "3x3 grid" → 9 positions), strips radius N and spacing N from count detection to avoid misparse.

Stage 3: Style Fingerprint

File: src/utils/styleFingerprint.ts

Analyzes existing scene materials and produces a compact summary:

Top 3 dominant colors (quantized bucketing)
Average roughness, metalness, emissive intensity
Heuristic style tag: neon, glass-heavy, metallic, industrial, natural wood, matte, minimalist

Injected as [STYLE] header in the system prompt so the AI matches existing aesthetics.

Stage 4: YSL Serializer

File: src/utils/aiSerializer.ts

The Yugma Scene Language (YSL) is a custom compact format designed for LLM context windows:

[★ hero_sphere] sphere id=abc pos=[2,0.5,0] rot=[0,0,0] s=[1,1,1]
  mat=#ff0000 r=0.3 m=0.8 e=#000000 ei=0 op=1 tags=[hero,accent]
  role=hero nextTo=[def] supports=[]

~45 tokens/object vs USD's ~400 tokens/object. A 50-object scene fits in ~2,400 tokens.

Tier 3 output includes: id, name, type, position, rotation, scale, full material, tags, semantic role, and relationships.

Stage 5: The Agentic Loop (Server)

File: packages/yugma-functions/src/ai/aiCompose.ts

Planning Pass (3D-GPT Pattern)

For complex requests (length > 80 chars or contains markers like "build", "design", "create a scene"):

A text-only planning call produces a numbered plan (3-8 steps)
The plan is prepended as [PLAN]...[/PLAN] to the executor's system prompt
Simple requests skip this entirely to save cost/latency

Executor (Tool Loop)

Claude → tool_use blocks → executeToolSim(SimScene) → tool_result → Claude → ...

SimScene: lightweight in-memory object tracking IDs and positions (NOT Firestore)
MAX_ITERATIONS: 8
15 tools: add_object, update_object, remove_object, set_environment, clear_scene, animate_object, duplicate_object, align_objects, distribute_objects, focus_camera, apply_material_preset, search_select, set_tags, create_group, set_animation

Dual-Write

After each successful turn, the client persists the session to Firestore via saveAISession (non-blocking, fire-and-forget). On next mount, the most recent session is loaded via loadLastSession.

Stage 6: Client Tool Dispatch

File: src/panels/AIPanel/index.tsx → TOOL_DISPATCH

Each tool call returned by aiCompose is dispatched to the appropriate Zustand store action:

Tool	Store	Action
`add_object`	useSceneStore	`addObject()` + `updateObject()`
`update_object`	useSceneStore	`updateObject()`
`remove_object`	useSceneStore	`removeObject()`
`set_environment`	useSceneStore	`setEnvironment()`
`focus_camera`	useSceneStore	`setCameraTarget()` (smooth tween)
`animate_object`	useAnimationStore	`addKeyframe()`
`apply_material_preset`	useSceneStore	lookup MATERIAL_PRESETS → `updateObject()`
`create_group`	useSceneStore	`addObject('box', invisible)` + `reparentObject()`

Configuration

Setting	Value	Notes
Model	`claude-sonnet-4-20250514`	Configurable via admin panel
Max tokens	4,096	Per iteration
Temperature	0.7 (creative) / 0.2 (precise)	User toggle
Rate limit	30 req/hr per user	Firestore counter
Function timeout	120s	Accommodates 8-iteration loops

Data Flow​

Stage 1: Reference Resolution​

Stage 2: Spatial Preprocessor​

Stage 3: Style Fingerprint​

Stage 4: YSL Serializer​

Stage 5: The Agentic Loop (Server)​

Planning Pass (3D-GPT Pattern)​

Executor (Tool Loop)​

Dual-Write​

Stage 6: Client Tool Dispatch​

Configuration​