Skip to content

Continuous Awareness

JARVIS doesn’t just respond when you ask — it watches your screen, understands what you’re doing, and steps in when you need help. The Continuous Awareness system captures your desktop every 7 seconds, runs OCR, tracks context, and detects behavioral patterns like struggle and stuckness. When it spots a problem, it escalates to cloud vision for app-specific analysis and delivers actionable suggestions.

The awareness pipeline runs continuously in the background:

Screen Capture (every 7s)
OCR Text Extraction (Tesseract)
Context Tracking (app, window, session)
├─── Error Detection (OCR pattern matching)
├─── Stuck Detection (same screen > 2 min)
├─── Struggle Detection (behavioral signals)
Cloud Vision Escalation (LLM screenshot analysis)
Suggestion Engine (rate-limited, deduped)
Delivery: Dashboard + Voice + Desktop Toast + Channels

Every 7 seconds (configurable), JARVIS takes a screenshot of your primary display. Captures are stored with JPEG thumbnails for the timeline view and automatically cleaned up based on retention tiers:

Retention TierDurationWhen Applied
full1 hourDefault — all captures
key_moment24 hoursErrors, struggles, context switches

Each screenshot is processed through Tesseract OCR to extract visible text. This text is used for:

  • Detecting error messages, stack traces, and warnings
  • Tracking what application and document you’re working in
  • Computing behavioral signals for struggle detection
  • Providing context to cloud vision analysis

The context tracker maintains a model of what you’re doing:

  • Active application — detected from window title and OCR content
  • Window title — tracked for context switches
  • Session management — groups continuous activity in the same app into sessions
  • URL and file path extraction — parsed from OCR text when applicable

When you switch applications, JARVIS fires a context_changed event. When a session ends (app switch or inactivity), JARVIS uses an LLM to summarize what you were doing and stores it for analytics.

This is the core intelligence of the awareness system. Rather than just detecting visible errors, JARVIS recognizes when you’re actively working but making no progress — a far more common scenario.

The struggle detector maintains a rolling window of the last 30 OCR snapshots (~3.5 minutes) and computes four behavioral signals:

SignalWeightWhat It Detects
Trial-and-error editing30%Many unique OCR states without convergence — you keep changing things but nothing sticks
Undo/revert patterns25%Text reverting to earlier states — you’re undoing changes and trying again
Repeated output25%Same terminal output or error message appearing across captures
Low meaningful progress20%Small character differences between captures despite continuous activity

When the weighted composite score exceeds 0.5, the detector starts a 2-minute grace period (you might figure it out on your own). If the struggle persists past the grace period, a struggle_detected event fires.

JARVIS classifies your current application into categories and tailors its help accordingly:

CategoryAppsWhat JARVIS Does
Code EditorVS Code, IntelliJ, Sublime, vim, Cursor, ZedAnalyzes visible code for syntax/logic errors, identifies the exact line and fix
TerminalTerminal, iTerm, Warp, PowerShellIdentifies the failing command, explains the error, provides the corrected command
BrowserChrome, Firefox, Edge, ArcIdentifies what you’re trying to do, guides you to the right button or link
Creative AppPhotoshop, Figma, Blender, IllustratorNames the specific tool, menu item, or keyboard shortcut you need
Puzzle/GameDetected via OCR patternsAnalyzes the game state, suggests the next 1-2 optimal moves
GeneralEverything elseIdentifies the obstacle and suggests the next action

When a struggle is detected, JARVIS escalates the screenshot to an LLM with vision capabilities along with an app-category-specific prompt. The LLM analyzes the actual screenshot content and returns targeted advice:

  • Code editor: “Line 42 is missing a closing parenthesis after the filter call. Change filter(x => x > 0 to filter(x => x > 0).”
  • Terminal: “The error is EACCES: permission denied. Run sudo npm install -g typescript instead.”
  • Photoshop: “The Clone Stamp tool is in the left toolbar, 6th from the top. Press S to select it.”
  • Puzzle: “Move the blue piece from E3 to C5 — this creates a fork threatening both the queen and rook.”

Five layers prevent the awareness system from being annoying:

LayerDefaultEffect
Struggle grace period2 minNever fires within first 2 minutes of detected struggle
Struggle cooldown3 minMinimum gap between struggle detections
Cloud vision cooldown30 secMinimum gap between LLM vision API calls
Suggestion rate limit60 secMinimum gap between any suggestions
Suggestion dedupHash-basedSame suggestion (type + title) won’t repeat

The effective minimum gap between struggle interventions is 3 minutes.

The context tracker scans OCR text for patterns matching common error messages:

  • Stack traces and exception names
  • “Error:”, “Warning:”, “Failed:” prefixes
  • Build/compile failure patterns
  • HTTP error codes (404, 500, etc.)

Error detection fires immediately (no grace period) since visible errors are unambiguous.

If the OCR text hash remains identical for 2+ minutes (configurable via stuck_threshold_ms), JARVIS detects you as “stuck” — staring at the same screen without making changes. This is different from struggle detection, which catches active but unproductive behavior.

The suggestion engine evaluates context and events in priority order:

PriorityTypeTrigger
1Errorerror_detected event — visible error on screen
2Strugglestruggle_detected event — behavioral pattern analysis
3Stuckstuck_detected event — unchanged screen for 2+ min
4AutomationRepetitive app-switching pattern (A→B 3+ times in 5 min)
5KnowledgeContext switch to a recognized entity/project in vault
6ScheduleCalendar event or vault commitment due within 15 min
7Break90+ minutes of continuous activity
8Cloud InsightActionable observation from cloud vision analysis

The awareness system adds three tabs to the dashboard:

Real-time display of the current screen context:

  • Active application and window title
  • Current session duration and capture count
  • Active suggestions with dismiss/act actions
  • Service status indicator

Chronological view of captures and events:

  • Thumbnail previews of screenshots
  • Key moment badges for significant events
  • Event type filtering
  • Click-to-expand for full OCR text

Weekly analytics and behavioral insights:

  • Daily active minutes bar chart
  • Focus score tracking
  • Top applications by usage
  • AI-generated weekly insights
  • Behavioral trend comparisons vs. previous week

When a suggestion is generated, JARVIS delivers it through multiple channels based on availability:

  1. WebSocket (dashboard) — if connected, delivered in real-time
  2. Voice (TTS) — spoken aloud if voice is active
  3. Desktop toast — system notification
  4. Channels fallback — if no dashboard clients are connected, broadcasts to Telegram/Discord/email

For high-confidence code or terminal struggles (score >= 0.7), JARVIS also dispatches a background agent to perform deep research (web search + vault lookup) and delivers the detailed findings as a follow-up.

Add to ~/.jarvis/config.yaml:

awareness:
enabled: true
capture_interval_ms: 7000 # Screenshot frequency (default: 7000)
min_pixel_change: 0.005 # Minimum change to process (default: 0.5%)
stuck_threshold_ms: 120000 # Unchanged screen = stuck (default: 2 min)
struggle_grace_ms: 120000 # Grace period before struggle fires (default: 2 min)
struggle_cooldown_ms: 180000 # Min gap between struggles (default: 3 min)
cloud_vision_enabled: true # Enable LLM screenshot analysis
cloud_vision_cooldown_ms: 30000 # Min gap between vision calls (default: 30s)
suggestion_rate_limit_ms: 60000 # Min gap between suggestions (default: 60s)
capture_dir: ~/.jarvis/awareness # Screenshot storage location

Set awareness.enabled: false in your config to completely disable screen capture and monitoring. You can also toggle it at runtime through the dashboard or via the REST API:

Terminal window
# Disable
curl -X POST http://localhost:3142/api/awareness/toggle -d '{"enabled": false}'
# Enable
curl -X POST http://localhost:3142/api/awareness/toggle -d '{"enabled": true}'
  • All captures are stored locally on your machine in ~/.jarvis/awareness/
  • OCR text is processed locally via Tesseract — no data leaves your machine for basic detection
  • Cloud vision escalation sends screenshots to your configured LLM provider (Anthropic/OpenAI) only when triggered by errors, struggles, or significant changes
  • Captures are automatically cleaned up: standard captures after 1 hour, key moments after 24 hours
  • You can disable cloud vision entirely with cloud_vision_enabled: false

Video tutorial placeholder: awareness timeline, suggestions, and live context.

Add your future video link here.