Browser Control

JARVIS controls a real browser instance using the Chrome DevTools Protocol (CDP). It auto-detects Chrome or Chromium on your system, launches it with a dedicated profile, attaches remotely, and exposes five browser tools to the agent.

How It Works

On startup, JARVIS:

Scans your system for Chrome and Chromium binaries (see Detection Order)
Launches the browser with remote debugging enabled on port 9222
Opens a CDP connection to the debugging endpoint
Injects the browser tools into the agent’s tool registry

From that point on, the agent can navigate pages, interact with elements, extract content, and take screenshots — all using numbered element IDs from a live page snapshot rather than fragile CSS selectors.

Detection Order

JARVIS tries browser binaries in this order:

google-chrome (Linux system PATH)
chromium-browser (Debian/Ubuntu)
chromium (Arch, Fedora, Alpine)
/usr/bin/chromium (absolute path)
/snap/bin/chromium (Snap package)
google-chrome-stable (some distros)
Windows Chrome via WSL interop path (/mnt/c/Program Files/Google/Chrome/...)

The first binary found is used. To override, set the JARVIS_BROWSER_PATH environment variable:

JARVIS_BROWSER_PATH=/usr/local/bin/my-chrome jarvis start

Launch Flags

JARVIS launches the browser with these flags:

--remote-debugging-port=9222
--user-data-dir=~/.jarvis/browser-profile
--disable-extensions
--no-first-run
--no-default-browser-check

On WSL2 or any Linux environment detected as non-desktop, --no-sandbox is added automatically (required for Chromium in many headless-capable environments).

The dedicated ~/.jarvis/browser-profile directory keeps JARVIS’s browsing history, cookies, and cached credentials separate from your personal browser profile.

Stealth Mode

JARVIS applies a stealth configuration to reduce bot detection:

Removes navigator.webdriver from the JavaScript context
Spoofs navigator.plugins, navigator.languages, and related properties
Sets a realistic user agent string
Disables the automation-controlled banner in the address bar

This is applied automatically on every new page navigation.

Browser Tools

The agent has access to five browser tools.

`navigate`

Navigate the browser to a URL.

Input: url (string)

Example agent usage:

navigate("https://news.ycombinator.com")

`click`

Click on a page element by its numbered snapshot ID.

Input: elementId (number)

The agent first takes a snapshot of the page to obtain element IDs, then clicks by ID. This is more reliable than CSS selectors because IDs are derived from the live accessibility tree, not the DOM structure.

`type`

Type text into a focused input element.

Input: text (string), elementId (number, optional)

If elementId is provided, the element is focused before typing. Otherwise, text is sent to the currently focused element.

`extract`

Extract visible text content from the current page or a specific element.

Input: elementId (number, optional)

Returns the text content of the specified element, or the full page text if no ID is given. This is the primary tool for reading page content.

`screenshot`

Take a screenshot of the current page and return it to the agent as a base64-encoded PNG image.

Input: elementId (number, optional)

If elementId is given, the screenshot is cropped to that element. Otherwise the full viewport is captured. The image is passed directly to the LLM via Claude’s Vision API for visual analysis.

Snapshot Approach

Rather than working with raw CSS selectors or XPaths, JARVIS uses a snapshot-based approach:

The page is snapshotted using the CDP Accessibility tree
Each interactive element receives a sequential numeric ID: [1], [2], [3], …
The agent sees these IDs in the snapshot and references them in tool calls
IDs are stable within a single snapshot but change after navigation or page updates

This approach handles dynamic pages, shadow DOM, and single-page applications more reliably than selector-based automation.

Background Monitor

A second browser controller runs on CDP port 9223 for the Proactive Agent. This separate instance handles heartbeat monitoring and automated reactions without blocking your interactive chat session. The two CDP connections are fully independent.

WSL2 Considerations

In WSL2, Linux browsers are strongly preferred over Windows Chrome:

Linux browsers share the WSL2 network namespace, so CDP connections to localhost:9222 work without any configuration
Windows Chrome runs in the Windows network namespace; reaching it from WSL2 requires additional routing that JARVIS does not attempt by default
If no Linux browser is found, JARVIS falls back to Windows Chrome via the interop path and adjusts the CDP host accordingly

Configuration

Browser control is always enabled. There are no configuration keys to toggle it. The CDP port and profile directory can be adjusted via environment variables if you have a port conflict:

JARVIS_CDP_PORT=9333 jarvis start
JARVIS_BROWSER_PROFILE=~/.jarvis/my-profile jarvis start

Troubleshooting

Browser not found

Run jarvis doctor — it reports which browser binary was detected. If none is found, install Chromium:

# Debian/Ubuntu/WSL2
sudo apt install chromium-browser

# Arch
sudo pacman -S chromium

# macOS
brew install --cask chromium

Port 9222 already in use

Another Chrome instance is already running with remote debugging enabled. Either close it or change the CDP port:

JARVIS_CDP_PORT=9333 jarvis start

Pages load but interactions fail

This usually means the page has anti-automation measures that override the stealth patches. Try setting a longer interaction delay or filing a bug report with the site URL.