Desktop Control
JARVIS controls native desktop applications through a Go sidecar process that connects to the daemon over a JWT-authenticated WebSocket. The sidecar runs natively on each platform — Windows, macOS, and Linux — using platform-specific APIs for window management, UI automation, screenshots, and input simulation.
Architecture
Section titled “Architecture”JARVIS Daemon (Bun, any machine) ↕ WebSocket (JWT auth)Sidecar (Go, target machine) → Platform APIs (Win32 / AppleScript / X11) → Chrome DevTools Protocol (CDP) → Terminal, Filesystem, ClipboardThe sidecar is a standalone Go binary that enrolls with the daemon using a JWT token. Once connected, it receives RPC commands over WebSocket and executes them using native platform APIs. Multiple sidecars can connect to the same daemon, giving JARVIS control over several machines simultaneously.
Installation
Section titled “Installation”Install the sidecar on each machine you want JARVIS to control:
bun install -g @usejarvis/sidecarOr download the prebuilt binary for your platform from the releases page.
Enrollment
Section titled “Enrollment”1. Install the sidecar
Section titled “1. Install the sidecar”See Installation above.
2. Enroll in the dashboard
Section titled “2. Enroll in the dashboard”- Open the JARVIS dashboard at
http://localhost:3142 - Go to Settings → Sidecar
- Enter a friendly name for this machine (e.g. “work laptop”) and click Enroll
- Click Copy to copy the token command
3. Run the sidecar
Section titled “3. Run the sidecar”Paste and run the copied command on the machine where you installed the sidecar:
jarvis-sidecar --token <your-token>The sidecar saves the token locally, so on subsequent runs you just need:
jarvis-sidecarOnce connected, the sidecar appears as online in the Settings page where you can configure its capabilities.
After enrollment, the sidecar reconnects automatically with exponential backoff if the connection drops.
Capabilities
Section titled “Capabilities”The sidecar advertises its capabilities during the preflight check. Each capability is verified at startup — only capabilities that pass the platform check are registered.
| Capability | Description | Windows | macOS | Linux |
|---|---|---|---|---|
terminal | Run shell commands | cmd.exe / PowerShell | bash/zsh | bash/zsh |
filesystem | Read/write files, list directories | Yes | Yes | Yes |
clipboard | Get/set clipboard content | PowerShell | pbcopy/pbpaste | xclip/xsel |
screenshot | Capture screen to PNG | PowerShell | screencapture | import (ImageMagick) |
desktop | Window management & UI automation | Win32 UIA | AppleScript | xdotool/wmctrl |
browser | Chrome control via CDP | Yes | Yes | Yes |
system_info | Hostname, platform, CPU info | Yes | Yes | Yes |
Desktop Tools
Section titled “Desktop Tools”When the desktop capability is available, the agent has access to these tools:
list_windows
Section titled “list_windows”List all visible top-level windows with their titles and process names.
[ { "title": "Visual Studio Code", "processName": "Code.exe", "handle": "0x1A2B" }, { "title": "File Explorer", "processName": "explorer.exe", "handle": "0x3C4D" }]get_window_tree
Section titled “get_window_tree”Get the UI Automation element tree for a window — reveals buttons, text fields, menus, and other controls.
Input: window (string), depth (integer, optional)Returns: nested element tree with automationId, name, controlType, boundingRectclick_element
Section titled “click_element”Click a UI element identified by automation ID, name, or control type.
Input: window (string), selector (string), selectorType (automationId | name | controlType)type_text
Section titled “type_text”Type text into a focused element. Uses platform-native input methods.
Input: window (string), selector (string), text (string)press_keys
Section titled “press_keys”Send a key combination to a window.
Input: window (string), keys (string)Key format: "Ctrl+C", "Alt+F4", "Win+D", "Ctrl+Shift+Esc".
launch_app
Section titled “launch_app”Launch an application by name or path.
Input: app (string), args (string[], optional)focus_window
Section titled “focus_window”Bring a window to the foreground.
Input: window (string)find_element
Section titled “find_element”Search for a UI element matching a query and return its properties.
Input: window (string), query (string)Returns: { automationId, name, controlType, boundingRect }Browser Tools
Section titled “Browser Tools”When the browser capability is available, the sidecar launches Chrome with remote debugging and provides these tools:
| Tool | Description |
|---|---|
browser_navigate | Navigate to a URL |
browser_snapshot | Get the accessibility tree of the current page |
browser_click | Click an element by selector |
browser_type | Type into an input field |
browser_screenshot | Capture the page as PNG |
browser_scroll | Scroll the page or an element |
browser_evaluate | Execute JavaScript in the page context |
Terminal & Filesystem Tools
Section titled “Terminal & Filesystem Tools”| Tool | Description |
|---|---|
run_command | Execute a shell command with configurable timeout and blocked-command list |
read_file | Read a file (respects blocked paths and max file size) |
write_file | Write content to a file (respects blocked paths) |
list_directory | List directory entries with types and sizes |
get_clipboard | Read clipboard content |
set_clipboard | Write to clipboard |
capture_screen | Take a full-screen screenshot |
get_system_info | Get hostname, platform, architecture, CPU count |
Platform Details
Section titled “Platform Details”Windows
Section titled “Windows”Desktop automation uses the Win32 UI Automation (UIA) COM API via PowerShell. This gives access to the accessibility tree of any Windows application, including:
- Enumerating windows (
EnumWindows) - Reading element trees (
IUIAutomation) - Clicking, typing, scrolling
- Getting the foreground window (
GetForegroundWindow) - Mouse and keyboard simulation (
SetCursorPos,mouse_event,SendKeys)
Screenshots use System.Windows.Forms.Screen via PowerShell.
Desktop automation uses AppleScript and Accessibility APIs:
osascriptfor window listing, app launching, and UI scriptingscreencapturefor screenshotspbcopy/pbpastefor clipboard
Desktop automation uses X11 tools:
xdotoolfor window management and input simulationwmctrlfor window listing and focusingxcliporxselfor clipboardimport(ImageMagick) for screenshots
Multi-Machine Setup
Section titled “Multi-Machine Setup”Connect multiple sidecars to a single JARVIS daemon for cross-machine orchestration:
# On machine A (e.g., your workstation)bun install -g @usejarvis/sidecarjarvis-sidecar --token <token-from-dashboard>
# On machine B (e.g., a build server)bun install -g @usejarvis/sidecarjarvis-sidecar --token <token-from-dashboard>The agent can then reference machines by hostname when dispatching tools. For example, it can run a build on your server while monitoring the result in your browser locally.
Sidecar Configuration
Section titled “Sidecar Configuration”The sidecar stores its config at ~/.jarvis/sidecar.yaml:
brain_url: "ws://localhost:3142/sidecar"token: "eyJ..." # JWT from enrollmentcapabilities: - terminal - filesystem - clipboard - screenshot - desktop - browser - system_infoterminal: blocked_commands: ["rm -rf /", "format", "shutdown"] default_shell: "" # auto-detected timeout_ms: 30000filesystem: blocked_paths: ["/etc/shadow", "/root"] max_file_size_kb: 10240browser: cdp_port: 9222 profile_dir: "" # auto-detectedawareness: screen_interval_ms: 7000 window_interval_ms: 3000 min_change_threshold: 0.02 stuck_threshold_ms: 120000Building from Source
Section titled “Building from Source”Requirements: Go 1.23 or later.
git clone https://github.com/vierisid/jarviscd jarvis/sidecar
# Build for your current platformgo build -o jarvis-sidecar .
# Cross-compile for WindowsGOOS=windows GOARCH=amd64 go build -o jarvis-sidecar.exe .
# Cross-compile for macOSGOOS=darwin GOARCH=arm64 go build -o jarvis-sidecar-macos .Troubleshooting
Section titled “Troubleshooting”Sidecar won’t connect
- Verify the daemon is running:
jarvis status - Re-enroll from the dashboard: Settings → Sidecar → Enroll
- Check firewall rules — port 3142 must be reachable from the sidecar machine
- Run
jarvis doctorfor a connectivity check
Desktop tools not working
- Verify the
desktopcapability passed preflight: check sidecar startup logs - On Linux, ensure
xdotoolandwmctrlare installed:sudo apt install xdotool wmctrl - On macOS, grant Accessibility permissions in System Settings > Privacy & Security
- On Windows, ensure the sidecar is running with appropriate permissions
Screenshots are blank
- On Linux, ensure ImageMagick is installed:
sudo apt install imagemagick - On Windows, ensure the sidecar is not running in a headless/service context without desktop access
Video Tutorial Placeholder
Section titled “Video Tutorial Placeholder”Video tutorial placeholder: enrolling a sidecar and testing desktop control.
Add your future video link here.