> ## Documentation Index
> Fetch the complete documentation index at: https://docs.qoder.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Computer Use Agent

Computer Use is a Qoder capability extension that lets the agent perceive your screen the way a person does and click, type, and scroll on your computer. When a task involves a graphical interface and can't be completed through the command line or an API, the agent can drive desktop apps and browsers directly — while you keep working in the foreground on your own things.

<Note>
  Computer Use is currently in Beta and is available on macOS and Windows — the experience and the underlying capabilities are still being improved.
</Note>

<div id="core-capabilities">
  ## **Core Capabilities**
</div>

<CardGroup cols={2}>
  <Card title="Screen perception" icon="eye">
    * Reads the visible content of the target app window and understands the layout, button text, form state, and other visual cues.
    * Takes screenshots throughout the run to confirm the page has loaded and the previous action took effect before deciding the next step.
  </Card>

  <Card title="Mouse and keyboard control" icon="keyboard">
    * Supports the full range of human input: clicks, double-clicks, drags, text entry, and keyboard shortcuts.
    * Operates at pixel-level precision so it can target small UI elements accurately.
  </Card>

  <Card title="Autonomous execution" icon="play">
    * The agent independently drives the mouse, keyboard, and screenshots, deciding each step based on the interface state.
    * On macOS, operations run in the background without stealing your foreground focus; on Windows, operations run in the foreground, so you can see the cursor move and each action take place (see the platform differences below).
  </Card>

  <Card title="Cross-app workflows" icon="layer-group">
    * Switches between desktop apps and chains multi-step operations into a complete flow.
    * Adjusts the next step based on what just happened, instead of replaying a fixed script.
  </Card>
</CardGroup>

<div id="usage-scenarios">
  ## **Usage Scenarios**
</div>

* **Drive desktop apps that lack an API**: when the target app has no CLI or plugin, the agent works the GUI directly — adjusting parameters in a design tool, batch-updating settings in an admin console, and so on.
* **Automate cross-app flows**: when a task spans several apps, the agent switches windows, copies data, and fills forms to complete the workflow end to end.
* **GUI verification and testing**: confirm that a UI change behaves as intended, reproduce a bug that only surfaces in the GUI, or check how the app responds to a specific sequence of actions.
* **Collect and organize information**: pull data out of apps with no export feature, or consolidate information that's scattered across several screens.

> For web apps, prefer the [Browser Agent](/user-guide/chat/browser-agent) first.

<div id="system-requirements">
  ## **System Requirements**
</div>

* macOS 14 (Sonoma) or later.
* Windows 10 or later.

<div id="platform-differences">
  ## **Differences between Windows and macOS**
</div>

Windows handles input and window management quite differently from macOS, so we reimplemented the entire desktop-control capability independently on Windows. There are two differences in how it feels to use:

* **Operations happen in the foreground**: Windows' input mechanism requires the target window to be in the foreground to receive actions, so you'll see the cursor move and each action actually take place. Press `Esc` at any time to interrupt.
* **Dialog boxes are recognized**: on Windows, apps like Office often pop up confirmation and warning dialogs. These are separate windows that don't appear in the main window's screenshot. Qoder automatically detects and composites them, so it can recognize and handle these dialogs and won't get stuck on prompts like "Save?".

<div id="how-to-use">
  ## **How to Use**
</div>

In the input box, use the `/computer-use` slash command to invoke the capability and describe the task in natural language. The session shows the agent's screenshots and progress in real time — interrupt the task or steer it with follow-up messages at any time.

<Note>
  Every mode of the Editor Window supports Computer Use; the Quest Window supports Computer Use only in Experts mode.
</Note>

<div id="app-window-snapshot">
  ## **App Window Snapshot**
</div>

When you want to send the frontmost app window into the conversation as context, **double-tap the `Command` key** to capture a snapshot of the current active app window. The screenshot is auto-attached to Qoder's input box as an image, ready to serve as context for your next instruction — no need to switch windows, take a screenshot manually, and upload it.

Useful scenarios:

* Pull a design mockup, prototype, or reference asset from a design tool straight into the conversation as the basis for generating or modifying code.
* When you hit an error or unusual screen in a browser, database client, terminal, or other app, send the snapshot to the agent for triage and analysis.
* While reading API documentation, a technical blog, or a tutorial, snap the key page so the agent can implement the feature or fix the code against the latest reference shown on screen.

To turn the capability off, open Settings, go to **Integrations**, find **App Window Snapshot**, and pick **Disabled** from the dropdown on the right.

<div id="permissions">
  ## **Permissions and Approvals**
</div>

The first time you enable Computer Use, Qoder shows a permission walkthrough that requests two system permissions:

* **Accessibility**: lets Qoder read the UI element tree and perform clicks, typing, and other accessibility actions.
* **Screen Recording**: lets Qoder capture screenshots of the active window so the agent can perceive the interface state.

Click **Open Settings** and the system jumps to the matching settings pane — drag Qoder Computer Use into the application list there to complete the authorization.

When the agent tries to operate a specific app, Qoder asks for your approval. The default is **Ask every time**, and you can change it in settings: open Settings, go to the **Integrations** page, find **Computer Use Agent** under **Built-in Agent**, and click the dropdown on the right to choose an execution policy.

* **Ask every time**: the agent asks for your confirmation each time it needs to drive the desktop.
* **Auto-run**: the agent runs desktop actions on its own, without per-action confirmation.
* **Disabled**: turn Computer Use off entirely.

The same setting controls both the Editor Window and the Quest Window.

<div id="cautions">
  ## **Cautions**
</div>

* **Granting access means granting control**: once enabled, the agent can drive other apps on your computer with the same effect as if you took the action yourself. Disable it in settings when you don't need it.
* **Some actions can't be undone**: the agent's actions inside desktop apps (sending messages, deleting files) may be irreversible. For high-risk scenarios, prefer the **Ask every time** policy.
* **Screen contents are screenshotted**: the agent perceives the interface through screenshots, so anything visible on screen — including sensitive information — may be captured. Close windows that contain passwords or private data before running automation.
