Observe running applications via DOM, accessibility, or vision
Your agents can see your running application — not through expensive screenshots, but through structured DOM extraction that uses a fraction of the tokens. Witness gives agents eyes on your app so they can verify, debug, and record proof of what's actually happening.
Witness tools available directly in the agent chat — observe, capture, and verify without leaving the conversation.
For web applications, extracts the DOM tree — elements, attributes, text content, computed styles. The most token-efficient way to observe an app. Agents get structured data, not pixels.
For native applications, reads the OS accessibility tree. macOS (AX API), Windows (UI Automation), Linux (AT-SPI). Works with any application that exposes accessibility nodes.
When structure isn't available, captures a screenshot and sends it to a vision-capable LLM. The most expensive tier — used as fallback, not default. Quality is configurable.
Capture application state as evidence — for QA, compliance, or debugging. Each observation step is recorded with timestamp, tier used, and data captured.
Choose the tier, quality level, and session step limits. Balance between token cost and observation depth. Default to Tier 1 for efficiency, escalate when needed.
Tier 1 DOM extraction costs a fraction of what screenshot-based vision costs. Your agents see more, understand better, and spend less — all running locally.
Witness ships with the Studio. No extra install, no extra cost.