Skip to content

Desktop Control

JARVIS controls native desktop applications through a Go sidecar process that connects to the daemon over a JWT-authenticated WebSocket. The sidecar runs natively on each platform — Windows, macOS, and Linux — using platform-specific APIs for window management, UI automation, screenshots, and input simulation.

JARVIS Daemon (Bun, any machine)
↕ WebSocket (JWT auth)
Sidecar (Go, target machine)
→ Platform APIs (Win32 / AppleScript / X11)
→ Chrome DevTools Protocol (CDP)
→ Terminal, Filesystem, Clipboard

The sidecar is a standalone Go binary that enrolls with the daemon using a JWT token. Once connected, it receives RPC commands over WebSocket and executes them using native platform APIs. Multiple sidecars can connect to the same daemon, giving JARVIS control over several machines simultaneously.

Install the sidecar on each machine you want JARVIS to control:

Terminal window
bun install -g @usejarvis/sidecar

Or download the prebuilt binary for your platform from the releases page.

See Installation above.

  1. Open the JARVIS dashboard at http://localhost:3142
  2. Go to SettingsSidecar
  3. Enter a friendly name for this machine (e.g. “work laptop”) and click Enroll
  4. Click Copy to copy the token command

Paste and run the copied command on the machine where you installed the sidecar:

Terminal window
jarvis-sidecar --token <your-token>

The sidecar saves the token locally, so on subsequent runs you just need:

Terminal window
jarvis-sidecar

Once connected, the sidecar appears as online in the Settings page where you can configure its capabilities.

After enrollment, the sidecar reconnects automatically with exponential backoff if the connection drops.

The sidecar advertises its capabilities during the preflight check. Each capability is verified at startup — only capabilities that pass the platform check are registered.

CapabilityDescriptionWindowsmacOSLinux
terminalRun shell commandscmd.exe / PowerShellbash/zshbash/zsh
filesystemRead/write files, list directoriesYesYesYes
clipboardGet/set clipboard contentPowerShellpbcopy/pbpastexclip/xsel
screenshotCapture screen to PNGPowerShellscreencaptureimport (ImageMagick)
desktopWindow management & UI automationWin32 UIAAppleScriptxdotool/wmctrl
browserChrome control via CDPYesYesYes
system_infoHostname, platform, CPU infoYesYesYes

When the desktop capability is available, the agent has access to these tools:

List all visible top-level windows with their titles and process names.

[
{ "title": "Visual Studio Code", "processName": "Code.exe", "handle": "0x1A2B" },
{ "title": "File Explorer", "processName": "explorer.exe", "handle": "0x3C4D" }
]

Get the UI Automation element tree for a window — reveals buttons, text fields, menus, and other controls.

Input: window (string), depth (integer, optional)
Returns: nested element tree with automationId, name, controlType, boundingRect

Click a UI element identified by automation ID, name, or control type.

Input: window (string), selector (string), selectorType (automationId | name | controlType)

Type text into a focused element. Uses platform-native input methods.

Input: window (string), selector (string), text (string)

Send a key combination to a window.

Input: window (string), keys (string)

Key format: "Ctrl+C", "Alt+F4", "Win+D", "Ctrl+Shift+Esc".

Launch an application by name or path.

Input: app (string), args (string[], optional)

Bring a window to the foreground.

Input: window (string)

Search for a UI element matching a query and return its properties.

Input: window (string), query (string)
Returns: { automationId, name, controlType, boundingRect }

When the browser capability is available, the sidecar launches Chrome with remote debugging and provides these tools:

ToolDescription
browser_navigateNavigate to a URL
browser_snapshotGet the accessibility tree of the current page
browser_clickClick an element by selector
browser_typeType into an input field
browser_screenshotCapture the page as PNG
browser_scrollScroll the page or an element
browser_evaluateExecute JavaScript in the page context
ToolDescription
run_commandExecute a shell command with configurable timeout and blocked-command list
read_fileRead a file (respects blocked paths and max file size)
write_fileWrite content to a file (respects blocked paths)
list_directoryList directory entries with types and sizes
get_clipboardRead clipboard content
set_clipboardWrite to clipboard
capture_screenTake a full-screen screenshot
get_system_infoGet hostname, platform, architecture, CPU count

Desktop automation uses the Win32 UI Automation (UIA) COM API via PowerShell. This gives access to the accessibility tree of any Windows application, including:

  • Enumerating windows (EnumWindows)
  • Reading element trees (IUIAutomation)
  • Clicking, typing, scrolling
  • Getting the foreground window (GetForegroundWindow)
  • Mouse and keyboard simulation (SetCursorPos, mouse_event, SendKeys)

Screenshots use System.Windows.Forms.Screen via PowerShell.

Desktop automation uses AppleScript and Accessibility APIs:

  • osascript for window listing, app launching, and UI scripting
  • screencapture for screenshots
  • pbcopy/pbpaste for clipboard

Desktop automation uses X11 tools:

  • xdotool for window management and input simulation
  • wmctrl for window listing and focusing
  • xclip or xsel for clipboard
  • import (ImageMagick) for screenshots

Connect multiple sidecars to a single JARVIS daemon for cross-machine orchestration:

Terminal window
# On machine A (e.g., your workstation)
bun install -g @usejarvis/sidecar
jarvis-sidecar --token <token-from-dashboard>
# On machine B (e.g., a build server)
bun install -g @usejarvis/sidecar
jarvis-sidecar --token <token-from-dashboard>

The agent can then reference machines by hostname when dispatching tools. For example, it can run a build on your server while monitoring the result in your browser locally.

The sidecar stores its config at ~/.jarvis/sidecar.yaml:

brain_url: "ws://localhost:3142/sidecar"
token: "eyJ..." # JWT from enrollment
capabilities:
- terminal
- filesystem
- clipboard
- screenshot
- desktop
- browser
- system_info
terminal:
blocked_commands: ["rm -rf /", "format", "shutdown"]
default_shell: "" # auto-detected
timeout_ms: 30000
filesystem:
blocked_paths: ["/etc/shadow", "/root"]
max_file_size_kb: 10240
browser:
cdp_port: 9222
profile_dir: "" # auto-detected
awareness:
screen_interval_ms: 7000
window_interval_ms: 3000
min_change_threshold: 0.02
stuck_threshold_ms: 120000

Requirements: Go 1.23 or later.

Terminal window
git clone https://github.com/vierisid/jarvis
cd jarvis/sidecar
# Build for your current platform
go build -o jarvis-sidecar .
# Cross-compile for Windows
GOOS=windows GOARCH=amd64 go build -o jarvis-sidecar.exe .
# Cross-compile for macOS
GOOS=darwin GOARCH=arm64 go build -o jarvis-sidecar-macos .

Sidecar won’t connect

  1. Verify the daemon is running: jarvis status
  2. Re-enroll from the dashboard: SettingsSidecarEnroll
  3. Check firewall rules — port 3142 must be reachable from the sidecar machine
  4. Run jarvis doctor for a connectivity check

Desktop tools not working

  1. Verify the desktop capability passed preflight: check sidecar startup logs
  2. On Linux, ensure xdotool and wmctrl are installed: sudo apt install xdotool wmctrl
  3. On macOS, grant Accessibility permissions in System Settings > Privacy & Security
  4. On Windows, ensure the sidecar is running with appropriate permissions

Screenshots are blank

  • On Linux, ensure ImageMagick is installed: sudo apt install imagemagick
  • On Windows, ensure the sidecar is not running in a headless/service context without desktop access

Video tutorial placeholder: enrolling a sidecar and testing desktop control.

Add your future video link here.