Skip to content

Introduction

JARVIS is an always-on autonomous AI daemon built for power users who want an AI that acts, not one that asks permission. It runs as a persistent background service on your machine, maintaining a live model of your world and executing tasks continuously — with or without you at the keyboard.

Most AI tools are reactive. You open them, type a question, read the answer, and close them. JARVIS is something else entirely.

TraitChatbotJARVIS
LifecycleSession-basedAlways running
InitiativeResponds when askedMonitors, schedules, acts proactively
Environment accessSandboxed or noneBrowser, desktop, files, APIs
MemoryCleared each sessionPersistent SQLite knowledge vault
ChannelsOne interfaceDashboard, Telegram, Discord, Voice
Execution modelSingle agentMulti-agent hierarchy with parallel tasks

JARVIS is not a chatbot with tools bolted on. It is an autonomous daemon designed to be dangerously capable by default.

JARVIS targets power users who:

  • Want an AI that works in the background while they focus on other things
  • Need automation across browser, desktop, files, and external services
  • Communicate through multiple channels (web, Telegram, Discord, voice)
  • Want persistent memory that grows smarter over time
  • Are comfortable granting an AI meaningful autonomy over their machine

If you want an AI that requires constant hand-holding and confirmation dialogs, JARVIS is not the right tool. If you want one that finishes the job, it is.

JARVIS auto-detects and launches Chrome or Chromium, attaches via the Chrome DevTools Protocol, and can navigate, click, type, extract content, and take screenshots — all without you touching the browser.

On Windows (including WSL), JARVIS launches a C# sidecar that uses the FlaUI/UIA automation framework to control any desktop application: find windows, click buttons, read text, type into fields, and scroll.

Speak to JARVIS using a wake word (“Hey JARVIS”), microphone button, or push-to-talk. Responses are streamed back as speech using edge-tts-universal — no API key required for TTS. Speech-to-text supports OpenAI Whisper, Groq, or a local model.

JARVIS orchestrates a hierarchy of 11 specialist sub-agents. A primary agent delegates tasks to specialists (researcher, coder, writer, analyst, and others) either synchronously for single tasks or asynchronously in parallel for complex workflows.

Every conversation contributes to a SQLite knowledge vault. JARVIS automatically extracts facts, preferences, events, and relationships from responses and injects relevant knowledge into future conversations.

JARVIS does not wait to be asked. It monitors Gmail and Google Calendar, executes scheduled commitments, queues research tasks, and sends you notifications through D-Bus (Linux), Telegram, or Discord.

Interact through the web dashboard, Telegram bot, Discord bot, or voice — all unified in a single conversation history.

┌─────────────────────────────────────────────────────────┐
│ JARVIS Daemon │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Agent │ │ Memory │ │ Proactive │ │
│ │ Orchestrator│ │ Vault │ │ Engine │ │
│ │ (+ sub- │ │ (SQLite) │ │ (observers, │ │
│ │ agents) │ │ │ │ scheduler) │ │
│ └──────┬───────┘ └──────────────┘ └───────────────┘ │
│ │ │
│ ┌──────▼───────────────────────────────────────────┐ │
│ │ Tool Layer │ │
│ │ Browser Desktop Files Search APIs Shell │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
│ WebSocket + REST
┌─────────────────┼─────────────────┐
▼ ▼ ▼
Web Dashboard Telegram Bot Discord Bot
(localhost:3142) (@yourbot) (#channel)
Voice (mic/TTS)

The daemon exposes a WebSocket server at localhost:3142 by default. The web dashboard connects to this endpoint for real-time streaming. Telegram and Discord adapters connect the same agent loop to external messaging platforms. All channels share a unified conversation history.

  • Runtime: Bun — fast TypeScript runtime, SQLite built in
  • Language: TypeScript (ESM modules)
  • LLM: Anthropic Claude (primary), OpenAI GPT-4 (fallback), Ollama (local)
  • Database: SQLite via bun:sqlite
  • Browser automation: Chrome DevTools Protocol (CDP)
  • Desktop automation: C# FlaUI sidecar over TCP
  • TTS: edge-tts-universal (free, no API key)
  • STT: OpenAI Whisper / Groq / local Whisper
  • Wake word: openwakeword-wasm-browser (ONNX models, runs in-browser)
  • Frontend: React