Computer Use Agent

Computer Use is a Qoder capability extension that lets the agent perceive your screen the way a person does and click, type, and scroll on your computer. When a task involves a graphical interface and can’t be completed through the command line or an API, the agent can drive desktop apps and browsers directly — while you keep working in the foreground on your own things.

Computer Use is currently in Beta and is available on macOS and Windows — the experience and the underlying capabilities are still being improved.

Core Capabilities

Screen perception

Reads the visible content of the target app window and understands the layout, button text, form state, and other visual cues.
Takes screenshots throughout the run to confirm the page has loaded and the previous action took effect before deciding the next step.

Mouse and keyboard control

Supports the full range of human input: clicks, double-clicks, drags, text entry, and keyboard shortcuts.
Operates at pixel-level precision so it can target small UI elements accurately.

Autonomous execution

The agent independently drives the mouse, keyboard, and screenshots, deciding each step based on the interface state.
On macOS, operations run in the background without stealing your foreground focus; on Windows, operations run in the foreground, so you can see the cursor move and each action take place (see the platform differences below).

Cross-app workflows

Switches between desktop apps and chains multi-step operations into a complete flow.
Adjusts the next step based on what just happened, instead of replaying a fixed script.

Usage Scenarios

Drive desktop apps that lack an API: when the target app has no CLI or plugin, the agent works the GUI directly — adjusting parameters in a design tool, batch-updating settings in an admin console, and so on.
Automate cross-app flows: when a task spans several apps, the agent switches windows, copies data, and fills forms to complete the workflow end to end.
GUI verification and testing: confirm that a UI change behaves as intended, reproduce a bug that only surfaces in the GUI, or check how the app responds to a specific sequence of actions.
Collect and organize information: pull data out of apps with no export feature, or consolidate information that’s scattered across several screens.

For web apps, prefer the Browser Agent first.

System Requirements

macOS 14 (Sonoma) or later.
Windows 10 or later.

Differences between Windows and macOS

Windows handles input and window management quite differently from macOS, so we reimplemented the entire desktop-control capability independently on Windows. There are two differences in how it feels to use:

Operations happen in the foreground: Windows’ input mechanism requires the target window to be in the foreground to receive actions, so you’ll see the cursor move and each action actually take place. Press Esc at any time to interrupt.
Dialog boxes are recognized: on Windows, apps like Office often pop up confirmation and warning dialogs. These are separate windows that don’t appear in the main window’s screenshot. Qoder automatically detects and composites them, so it can recognize and handle these dialogs and won’t get stuck on prompts like “Save?”.

How to Use

In the input box, use the /computer-use slash command to invoke the capability and describe the task in natural language. The session shows the agent’s screenshots and progress in real time — interrupt the task or steer it with follow-up messages at any time.

Every mode of the Editor Window supports Computer Use; the Quest Window supports Computer Use only in Experts mode.

App Window Snapshot

When you want to send the frontmost app window into the conversation as context, double-tap the Command key to capture a snapshot of the current active app window. The screenshot is auto-attached to Qoder’s input box as an image, ready to serve as context for your next instruction — no need to switch windows, take a screenshot manually, and upload it. Useful scenarios:

Pull a design mockup, prototype, or reference asset from a design tool straight into the conversation as the basis for generating or modifying code.
When you hit an error or unusual screen in a browser, database client, terminal, or other app, send the snapshot to the agent for triage and analysis.
While reading API documentation, a technical blog, or a tutorial, snap the key page so the agent can implement the feature or fix the code against the latest reference shown on screen.

To turn the capability off, open Settings, go to Integrations, find App Window Snapshot, and pick Disabled from the dropdown on the right.

Permissions and Approvals

The first time you enable Computer Use, Qoder shows a permission walkthrough that requests two system permissions:

Accessibility: lets Qoder read the UI element tree and perform clicks, typing, and other accessibility actions.
Screen Recording: lets Qoder capture screenshots of the active window so the agent can perceive the interface state.

Click Open Settings and the system jumps to the matching settings pane — drag Qoder Computer Use into the application list there to complete the authorization. When the agent tries to operate a specific app, Qoder asks for your approval. The default is Ask every time, and you can change it in settings: open Settings, go to the Integrations page, find Computer Use Agent under Built-in Agent, and click the dropdown on the right to choose an execution policy.

Ask every time: the agent asks for your confirmation each time it needs to drive the desktop.
Auto-run: the agent runs desktop actions on its own, without per-action confirmation.
Disabled: turn Computer Use off entirely.

The same setting controls both the Editor Window and the Quest Window.

Cautions

Granting access means granting control: once enabled, the agent can drive other apps on your computer with the same effect as if you took the action yourself. Disable it in settings when you don’t need it.
Some actions can’t be undone: the agent’s actions inside desktop apps (sending messages, deleting files) may be irreversible. For high-risk scenarios, prefer the Ask every time policy.
Screen contents are screenshotted: the agent perceives the interface through screenshots, so anything visible on screen — including sensitive information — may be captured. Close windows that contain passwords or private data before running automation.

​Core Capabilities

Screen perception

Mouse and keyboard control

Autonomous execution

Cross-app workflows

​Usage Scenarios

​System Requirements

​Differences between Windows and macOS

​How to Use

​App Window Snapshot

​Permissions and Approvals

​Cautions

Core Capabilities

Usage Scenarios

System Requirements

Differences between Windows and macOS

How to Use

App Window Snapshot

Permissions and Approvals

Cautions