Five roles, one runtime.
Planner, perception, navigation, manipulation, and supervisor coordinate through structured tool calls instead of a fixed macro.
The orchestration layer enabling the next leap for next-generation embodied intelligence—one goal, many agents, verified execution.
The video shows a single natural-language task decomposed into collaboration, perception updates, manipulation, and explicit verification in the real environment.
Planner, perception, navigation, manipulation, and supervisor coordinate through structured tool calls instead of a fixed macro.
10+ planning and action steps run with closed-loop verification and automatic recovery rather than fire-and-forget playback.
Vision-guided updates, force-limited grasping, and digital twin validation keep the system within tight tolerances.
The runtime surfaces which agent is planning, perceiving, acting, and checking at each phase.
Proof line · planner + perception + supervisor in one traceThe system re-reads the scene, aligns again, and updates the next move before the grasp or handoff is accepted.
Proof line · detect → align → verify before commitThe supervisor only marks completion after the postcondition passes, which makes the demo legible and reproducible.
Proof line · handoff success checked by supervisor
Hard-coded sequences fracture when the scene shifts, tolerances drift, or handoff conditions change. The agentic approach breaks work into interpretable steps, keeps the scene in context, and recovers instead of failing silently.
They assume the same geometry, same object placement, and same timing on every run.
Plans stay modular, perception re-checks the scene, and the supervisor can stop or retry when postconditions fail.
Agents, workflow graphs, digital twins, process monitoring, and live video — unified in a single interface.
AUBO i20, UR10E, and support services expose scoped capabilities without fragmenting the orchestration layer.
The graph exposes task ordering, live state, and recovery points instead of hiding execution inside opaque controllers.
WebSocket chat and supervisor controls keep intervention close to the running workflow instead of off in a separate panel.
The interface grounds each step in both the digital twin and the camera feed so placement and completion stay legible.
A shared interface, a live world model, specialized agents, and safety controls turn mixed hardware into a single artifact-driven runtime.
Swap a hardware backend without changing agent logic. The orchestration layer sees capabilities, schemas, and limits instead of vendor-specific APIs.
Agents query poses, plan collision-free motion, and verify placements against a shared digital twin.
Detection, planning, and manipulation agents keep re-reading the task state instead of blindly replaying a motion file.
Force and speed limits, postcondition checks, retries, and audit trails keep the runtime supervised and reproducible instead of opaque.
Digital coordination, physical execution, and supervision sit in one system instead of being split across disconnected dashboards.
Skill specs, failure handling, and the runtime trace now live in one bento surface instead of separate sections.
STEP 1 Parse task planner 0.0s STEP 2 Query world state supervisor 1.3s STEP 3 Navigate to shelf nav 9.8s STEP 4 Detect cassette perception 13.7s STEP 5 Pick + force check manipulation 21.4s STEP 6 Inspect seal region perception 29.6s STEP 7 Regrasp for alignment manip + planner 38.9s STEP 8 Deliver to conveyor nav + manipulation51.0s STEP 9 Verify postcondition supervisor 58.4s
Recover grasp quality when the initial pickup is unstable or offset.
name: bin-pick-with-regrasp agents: [perception, manipulation, supervisor] stages: - detect_candidates(top_k=5) - score_grasps(force_limit_n=18) - pick(best) - regrasp(if tilt_deg > 7) fallback: "request_new_view"
Route the object based on visual QA and tolerance-aware checks.
name: seal-inspection-gate tools: [rgbd_detector, visual_qa, reject_gate] checks: seal_gap_mm: 0.8 label_present: true surface_glare: "auto_compensate" on_fail: "quarantine_bin"
Synchronize mobile base timing with a moving downstream handoff target.
name: conveyor-intercept agents: [nav, supervisor] params: target_lane: 3 eta_slack_s: 2.5 geofence: "conveyor_zone" retry: "replan_if_lane_blocked"
Coordinate arm, AGV, and supervisor checkpoints for a reliable transfer.
name: multi-robot-handoff sequence: - approach(sync_pose=true) - micro_align(vision=true) - transfer(force_limit_n=12) - verify_release() abort_if: "postcondition_timeout"
A single interface turns any hardware into a callable tool. Swap a UR10 for an AUBO i20 without touching agent code.
@dataclass class TraceEvent: step_id: str agent: str tool: str params: dict result: Any timestamp: float status: Literal[ "pending", "running", "success", "failed" ]
# Any hardware = a tool. # Same interface for all. class ToolRegistry: def register( self, name: str, schema: ToolSchema, limits: SafetyLimits, handler: Callable ) -> None: ... def invoke( self, name: str, params: dict, world: WorldState ) -> ToolResult: ...
async def run_workflow( wf: Workflow, agents: AgentTeam, world: WorldState, trace: TraceLog ): for step in wf.topo(): agent = agents.get( step.agent ) r = await agent.execute( step, registry.list(agent), world ) trace.emit( step, agent, r ) if not step.check(r): await wf.recover(step)