AI Scene Composition Pipeline
The AI pipeline is Yugma's core differentiator. It transforms natural language into structured 3D scenes through a multi-stage architecture.
Data Flow
User: "Build a sci-fi room with neon lighting"
│
▼
[1] Reference Resolver (client)
"that" → selected obj, "the red sphere" → id
│
▼
[2] Spatial Preprocessor (client)
"6 cubes in a circle" → precomputed [x,y,z] positions
│
▼
[3] Style Fingerprint (client)
Analyzes scene palette → "[STYLE] industrial metallic..."
│
▼
[4] YSL Serializer (client)
Scene → compact text at ~45 tokens/object
│
▼
[5] Cloud Function: aiCompose (server)
├── Planning pass (complex requests only)
│ └── Claude outputs ordered plan (text only)
├── Executor pass (agentic loop, up to 8 iterations)
│ ├── Claude calls tools → SimScene executes them
│ ├── Tool results (incl. created IDs) sent back
│ └── Claude sees results → calls more tools or stops
└── Firestore session persistence (non-blocking)
│
▼
[6] Tool Dispatch (client)
Each tool call → Zustand store action → Three.js re-renders
Stage 1: Reference Resolution
File: src/utils/referenceResolver.ts
Resolves natural language references to scene object IDs before sending to the AI:
"that"/"it"→ currently selected object"the red sphere"→ search by type + color"everything on the left"→ filter by position
Stage 2: Spatial Preprocessor
File: src/utils/spatialPreprocessor.ts
LLMs hallucinate trigonometry. The preprocessor detects arrangement patterns and computes exact positions client-side:
| Pattern | Trigger words | Computation |
|---|---|---|
| Circle | "in a circle", "ring", "circular" | x = r*cos(θ), z = r*sin(θ) |
| Grid | "in a grid", "3x3", "matrix" | Row/col with centered offset |
| Stack | "stacked", "tower", "pile" | Y increments |
| Spiral | "spiral", "helix" | Circular path + Y ramp |
| Line | "in a row", "line of" | X spacing, centered |
| Scatter | "scattered", "randomly placed" | Deterministic pseudo-random (LCG) |
Positions are injected as a [SPATIAL_PREPROCESSOR] hint in the message — visible to AI but not to the user's chat history.
Handles NxM grid parsing (e.g., "3x3 grid" → 9 positions), strips radius N and spacing N from count detection to avoid misparse.
Stage 3: Style Fingerprint
File: src/utils/styleFingerprint.ts
Analyzes existing scene materials and produces a compact summary:
- Top 3 dominant colors (quantized bucketing)
- Average roughness, metalness, emissive intensity
- Heuristic style tag:
neon,glass-heavy,metallic,industrial,natural wood,matte,minimalist
Injected as [STYLE] header in the system prompt so the AI matches existing aesthetics.
Stage 4: YSL Serializer
File: src/utils/aiSerializer.ts
The Yugma Scene Language (YSL) is a custom compact format designed for LLM context windows:
[★ hero_sphere] sphere id=abc pos=[2,0.5,0] rot=[0,0,0] s=[1,1,1]
mat=#ff0000 r=0.3 m=0.8 e=#000000 ei=0 op=1 tags=[hero,accent]
role=hero nextTo=[def] supports=[]
~45 tokens/object vs USD's ~400 tokens/object. A 50-object scene fits in ~2,400 tokens.
Tier 3 output includes: id, name, type, position, rotation, scale, full material, tags, semantic role, and relationships.
Stage 5: The Agentic Loop (Server)
File: packages/yugma-functions/src/ai/aiCompose.ts
Planning Pass (3D-GPT Pattern)
For complex requests (length > 80 chars or contains markers like "build", "design", "create a scene"):
- A text-only planning call produces a numbered plan (3-8 steps)
- The plan is prepended as
[PLAN]...[/PLAN]to the executor's system prompt - Simple requests skip this entirely to save cost/latency
Executor (Tool Loop)
Claude → tool_use blocks → executeToolSim(SimScene) → tool_result → Claude → ...
- SimScene: lightweight in-memory object tracking IDs and positions (NOT Firestore)
- MAX_ITERATIONS: 8
- 15 tools: add_object, update_object, remove_object, set_environment, clear_scene, animate_object, duplicate_object, align_objects, distribute_objects, focus_camera, apply_material_preset, search_select, set_tags, create_group, set_animation
Dual-Write
After each successful turn, the client persists the session to Firestore via saveAISession (non-blocking, fire-and-forget). On next mount, the most recent session is loaded via loadLastSession.
Stage 6: Client Tool Dispatch
File: src/panels/AIPanel/index.tsx → TOOL_DISPATCH
Each tool call returned by aiCompose is dispatched to the appropriate Zustand store action:
| Tool | Store | Action |
|---|---|---|
add_object | useSceneStore | addObject() + updateObject() |
update_object | useSceneStore | updateObject() |
remove_object | useSceneStore | removeObject() |
set_environment | useSceneStore | setEnvironment() |
focus_camera | useSceneStore | setCameraTarget() (smooth tween) |
animate_object | useAnimationStore | addKeyframe() |
apply_material_preset | useSceneStore | lookup MATERIAL_PRESETS → updateObject() |
create_group | useSceneStore | addObject('box', invisible) + reparentObject() |
Configuration
| Setting | Value | Notes |
|---|---|---|
| Model | claude-sonnet-4-20250514 | Configurable via admin panel |
| Max tokens | 4,096 | Per iteration |
| Temperature | 0.7 (creative) / 0.2 (precise) | User toggle |
| Rate limit | 30 req/hr per user | Firestore counter |
| Function timeout | 120s | Accommodates 8-iteration loops |