Skip to main content

AI Scene Composition Pipeline

The AI pipeline is Yugma's core differentiator. It transforms natural language into structured 3D scenes through a multi-stage architecture.

Data Flow

User: "Build a sci-fi room with neon lighting"


[1] Reference Resolver (client)
"that" → selected obj, "the red sphere" → id


[2] Spatial Preprocessor (client)
"6 cubes in a circle" → precomputed [x,y,z] positions


[3] Style Fingerprint (client)
Analyzes scene palette → "[STYLE] industrial metallic..."


[4] YSL Serializer (client)
Scene → compact text at ~45 tokens/object


[5] Cloud Function: aiCompose (server)
├── Planning pass (complex requests only)
│ └── Claude outputs ordered plan (text only)
├── Executor pass (agentic loop, up to 8 iterations)
│ ├── Claude calls tools → SimScene executes them
│ ├── Tool results (incl. created IDs) sent back
│ └── Claude sees results → calls more tools or stops
└── Firestore session persistence (non-blocking)


[6] Tool Dispatch (client)
Each tool call → Zustand store action → Three.js re-renders

Stage 1: Reference Resolution

File: src/utils/referenceResolver.ts

Resolves natural language references to scene object IDs before sending to the AI:

  • "that" / "it" → currently selected object
  • "the red sphere" → search by type + color
  • "everything on the left" → filter by position

Stage 2: Spatial Preprocessor

File: src/utils/spatialPreprocessor.ts

LLMs hallucinate trigonometry. The preprocessor detects arrangement patterns and computes exact positions client-side:

PatternTrigger wordsComputation
Circle"in a circle", "ring", "circular"x = r*cos(θ), z = r*sin(θ)
Grid"in a grid", "3x3", "matrix"Row/col with centered offset
Stack"stacked", "tower", "pile"Y increments
Spiral"spiral", "helix"Circular path + Y ramp
Line"in a row", "line of"X spacing, centered
Scatter"scattered", "randomly placed"Deterministic pseudo-random (LCG)

Positions are injected as a [SPATIAL_PREPROCESSOR] hint in the message — visible to AI but not to the user's chat history.

Handles NxM grid parsing (e.g., "3x3 grid" → 9 positions), strips radius N and spacing N from count detection to avoid misparse.

Stage 3: Style Fingerprint

File: src/utils/styleFingerprint.ts

Analyzes existing scene materials and produces a compact summary:

  • Top 3 dominant colors (quantized bucketing)
  • Average roughness, metalness, emissive intensity
  • Heuristic style tag: neon, glass-heavy, metallic, industrial, natural wood, matte, minimalist

Injected as [STYLE] header in the system prompt so the AI matches existing aesthetics.

Stage 4: YSL Serializer

File: src/utils/aiSerializer.ts

The Yugma Scene Language (YSL) is a custom compact format designed for LLM context windows:

[★ hero_sphere] sphere id=abc pos=[2,0.5,0] rot=[0,0,0] s=[1,1,1]
mat=#ff0000 r=0.3 m=0.8 e=#000000 ei=0 op=1 tags=[hero,accent]
role=hero nextTo=[def] supports=[]

~45 tokens/object vs USD's ~400 tokens/object. A 50-object scene fits in ~2,400 tokens.

Tier 3 output includes: id, name, type, position, rotation, scale, full material, tags, semantic role, and relationships.

Stage 5: The Agentic Loop (Server)

File: packages/yugma-functions/src/ai/aiCompose.ts

Planning Pass (3D-GPT Pattern)

For complex requests (length > 80 chars or contains markers like "build", "design", "create a scene"):

  1. A text-only planning call produces a numbered plan (3-8 steps)
  2. The plan is prepended as [PLAN]...[/PLAN] to the executor's system prompt
  3. Simple requests skip this entirely to save cost/latency

Executor (Tool Loop)

Claude → tool_use blocks → executeToolSim(SimScene) → tool_result → Claude → ...
  • SimScene: lightweight in-memory object tracking IDs and positions (NOT Firestore)
  • MAX_ITERATIONS: 8
  • 15 tools: add_object, update_object, remove_object, set_environment, clear_scene, animate_object, duplicate_object, align_objects, distribute_objects, focus_camera, apply_material_preset, search_select, set_tags, create_group, set_animation

Dual-Write

After each successful turn, the client persists the session to Firestore via saveAISession (non-blocking, fire-and-forget). On next mount, the most recent session is loaded via loadLastSession.

Stage 6: Client Tool Dispatch

File: src/panels/AIPanel/index.tsxTOOL_DISPATCH

Each tool call returned by aiCompose is dispatched to the appropriate Zustand store action:

ToolStoreAction
add_objectuseSceneStoreaddObject() + updateObject()
update_objectuseSceneStoreupdateObject()
remove_objectuseSceneStoreremoveObject()
set_environmentuseSceneStoresetEnvironment()
focus_camerauseSceneStoresetCameraTarget() (smooth tween)
animate_objectuseAnimationStoreaddKeyframe()
apply_material_presetuseSceneStorelookup MATERIAL_PRESETS → updateObject()
create_groupuseSceneStoreaddObject('box', invisible) + reparentObject()

Configuration

SettingValueNotes
Modelclaude-sonnet-4-20250514Configurable via admin panel
Max tokens4,096Per iteration
Temperature0.7 (creative) / 0.2 (precise)User toggle
Rate limit30 req/hr per userFirestore counter
Function timeout120sAccommodates 8-iteration loops