Agentic Embodied Vision System

Bringing
Intelligence
to Reality

The orchestration layer enabling the next leap for next-generation embodied intelligence—one goal, many agents, verified execution.

Goal → Plan → Act → Verify Multi-agent runtime Digital twin Step-level tracing
Digital Twin Multi-agent Runtime AI bridging digital and physical worlds
Demonstration

Real-world grounding, not just a simulated workflow.

The video shows a single natural-language task decomposed into collaboration, perception updates, manipulation, and explicit verification in the real environment.

Agents

Five roles, one runtime.

Planner, perception, navigation, manipulation, and supervisor coordinate through structured tool calls instead of a fixed macro.

Execution

Iterative step execution.

10+ planning and action steps run with closed-loop verification and automatic recovery rather than fire-and-forget playback.

Precision

Correction before commitment.

Vision-guided updates, force-limited grasping, and digital twin validation keep the system within tight tolerances.

What To Notice

Multi-agent collaboration is visible in the log.

The runtime surfaces which agent is planning, perceiving, acting, and checking at each phase.

Proof line · planner + perception + supervisor in one trace
Alignment

Closed-loop visual correction.

The system re-reads the scene, aligns again, and updates the next move before the grasp or handoff is accepted.

Proof line · detect → align → verify before commit
Verification

Success is checked, not assumed.

The supervisor only marks completion after the postcondition passes, which makes the demo legible and reproducible.

Proof line · handoff success checked by supervisor
Traditional vs agentic
The Problem

Traditional automation is brittle. Adaptive systems survive change.

Hard-coded sequences fracture when the scene shifts, tolerances drift, or handoff conditions change. The agentic approach breaks work into interpretable steps, keeps the scene in context, and recovers instead of failing silently.

01 · brittle scripts

Top-down macros break the moment the world stops matching the script.

They assume the same geometry, same object placement, and same timing on every run.

02 · adaptive runtime

Language, tools, and verification create a system that can actually recover.

Plans stay modular, perception re-checks the scene, and the supervisor can stop or retry when postconditions fail.

System Overview

The complete orchestration dashboard

Agents, workflow graphs, digital twins, process monitoring, and live video — unified in a single interface.

Dashboard
01 · Agent Team

Robot arms, AGVs, and shared context stay in one runtime.

AUBO i20, UR10E, and support services expose scoped capabilities without fragmenting the orchestration layer.

02 · Task DAG

Dependency-aware execution remains visible while the job runs.

The graph exposes task ordering, live state, and recovery points instead of hiding execution inside opaque controllers.

03 · Human In The Loop

Operators can pause, modify, and override from the same interface.

WebSocket chat and supervisor controls keep intervention close to the running workflow instead of off in a separate panel.

04 · Twin + Stream

3D state and live video stay synchronized for planning and verification.

The interface grounds each step in both the digital twin and the camera feed so placement and completion stay legible.

Architecture

The system stays on the left rail. The evidence lives on the right.

A shared interface, a live world model, specialized agents, and safety controls turn mixed hardware into a single artifact-driven runtime.

01 · Unified tool registry
02 · Digital twin at 10 Hz
03 · Perception to action loop
04 · Human override + audit trail
Tool Registry
01 · Abstraction
Unified interface

One tool registry abstracts robot arms, AGVs, sensors, and services.

Swap a hardware backend without changing agent logic. The orchestration layer sees capabilities, schemas, and limits instead of vendor-specific APIs.

Digital twin
02 · Twin
World state

Live scene graph

Agents query poses, plan collision-free motion, and verify placements against a shared digital twin.

Lab capabilities
03 · Perception
Execution loop

See, move, grasp, verify

Detection, planning, and manipulation agents keep re-reading the task state instead of blindly replaying a motion file.

Safety
04 · Safety
Control plane

Human override remains available at every step.

Force and speed limits, postcondition checks, retries, and audit trails keep the runtime supervised and reproducible instead of opaque.

Real-world deployment
05 · Deployment
Embodied runtime

The same orchestration layer spans the robot team and the operator.

Digital coordination, physical execution, and supervision sit in one system instead of being split across disconnected dashboards.

Skills + Trace

Complex robotic skills become auditable execution.

Skill specs, failure handling, and the runtime trace now live in one bento surface instead of separate sections.

Task Query

input
>"Move cassette from Shelf A → Inspect seal quality → Regrasp if needed → Deliver to Conveyor 3."

Execution Trace

trace · 9 steps · 58.4s
STEP 1  Parse task            planner          0.0s
STEP 2  Query world state     supervisor       1.3s
STEP 3  Navigate to shelf     nav              9.8s
STEP 4  Detect cassette       perception       13.7s
STEP 5  Pick + force check    manipulation     21.4s
STEP 6  Inspect seal region   perception       29.6s
STEP 7  Regrasp for alignment manip + planner  38.9s
STEP 8  Deliver to conveyor   nav + manipulation51.0s
STEP 9  Verify postcondition  supervisor       58.4s

bin-pick-with-regrasp

Recover grasp quality when the initial pickup is unstable or offset.

name: bin-pick-with-regrasp
agents: [perception, manipulation, supervisor]
stages:
  - detect_candidates(top_k=5)
  - score_grasps(force_limit_n=18)
  - pick(best)
  - regrasp(if tilt_deg > 7)
fallback: "request_new_view"

seal-inspection-gate

Route the object based on visual QA and tolerance-aware checks.

name: seal-inspection-gate
tools: [rgbd_detector, visual_qa, reject_gate]
checks:
  seal_gap_mm: 0.8
  label_present: true
  surface_glare: "auto_compensate"
on_fail: "quarantine_bin"

conveyor-intercept

Synchronize mobile base timing with a moving downstream handoff target.

name: conveyor-intercept
agents: [nav, supervisor]
params:
  target_lane: 3
  eta_slack_s: 2.5
  geofence: "conveyor_zone"
retry: "replan_if_lane_blocked"

multi-robot-handoff

Coordinate arm, AGV, and supervisor checkpoints for a reliable transfer.

name: multi-robot-handoff
sequence:
  - approach(sync_pose=true)
  - micro_align(vision=true)
  - transfer(force_limit_n=12)
  - verify_release()
abort_if: "postcondition_timeout"
Interfaces

Typed abstractions for heterogeneous hardware

A single interface turns any hardware into a callable tool. Swap a UR10 for an AUBO i20 without touching agent code.

trace_event.py
python
@dataclass
class TraceEvent:
    step_id:   str
    agent:     str
    tool:      str
    params:    dict
    result:    Any
    timestamp: float
    status:    Literal[
      "pending",
      "running",
      "success",
      "failed"
    ]
tool_registry.py
python
# Any hardware = a tool.
# Same interface for all.

class ToolRegistry:

  def register(
    self,
    name: str,
    schema: ToolSchema,
    limits: SafetyLimits,
    handler: Callable
  ) -> None: ...

  def invoke(
    self,
    name: str,
    params: dict,
    world: WorldState
  ) -> ToolResult: ...
orchestrator.py
python
async def run_workflow(
  wf: Workflow,
  agents: AgentTeam,
  world: WorldState,
  trace: TraceLog
):
  for step in wf.topo():
    agent = agents.get(
      step.agent
    )
    r = await agent.execute(
      step,
      registry.list(agent),
      world
    )
    trace.emit(
      step, agent, r
    )
    if not step.check(r):
      await wf.recover(step)

Explore this work. Reach out

For technical discussions, collaboration, or live walkthroughs.
Get in touch Watch overview