Introducing elvatis-mcp: Control Your Entire Infrastructure from Claude Desktop

A new open-source MCP server that connects Claude, Cursor, and Windsurf to your smart home, memory, cron jobs, notifications, and local LLMs -- with 32 tools across 7 domains.

Introducing elvatis-mcp: Control Your Entire Infrastructure from Claude Desktop

I have been running Claude Desktop for a while now, and for a while now the workflow felt slightly broken. I could ask Claude anything, but the moment I wanted it to actually do something in my environment, I was back to copy-pasting. Turn off the lights? SSH into a server? Check what my trading bot is doing? All manual.

MCP changed that. And so I built elvatis-mcp.

What MCP is, in one paragraph

Model Context Protocol is an open standard from Anthropic. It lets any MCP-compatible AI client (Claude Desktop, Cursor, Windsurf, Zed) connect to an external tool server and call its functions directly. You configure it once in a JSON file, restart the client, and suddenly Claude can do things instead of just knowing things. It is, effectively, a plugin system for AI clients that is not locked to any one vendor.

What elvatis-mcp does

It is an MCP server that bridges Claude (and any other MCP client) to your actual infrastructure. Right now it ships 32 tools across 7 domains:

Home Assistant — control lights, thermostats, scenes, vacuum, and read environmental sensors. Ask Claude to dim the Wohnzimmer to 40% and it happens. No app switching.

Memory — read and write to a daily log, search across the last N days of notes. Useful if you want Claude to remember context across sessions without relying on its context window.

OpenClaw Cron — list, trigger, create, edit, and manage scheduled automation jobs. Full lifecycle from Claude.

Notifications — send WhatsApp or Telegram messages via OpenClaw channels. Have a long-running task tell you when it is done.

AI Sub-Agents — delegate tasks to Claude, Gemini, Codex, or a local LLM. Each backend has its strengths: Claude for reasoning, Gemini for long context, Codex for coding, local models for free private inference.

Prompt Splitting — break a complex prompt into subtasks and assign each to the most appropriate model. Heavy reasoning goes to Claude, quick lookups go to a local model, code generation goes to Codex. Each subtask includes a suggested model that you can override before execution.

System Management — health check all services at once, view server logs, manage LM Studio models, transfer files, start llama.cpp servers with TurboQuant KV-cache compression.

Multi-LLM orchestration

This is the part I find most interesting. elvatis-mcp supports six backends: Claude (via CLI), Gemini (via CLI), Codex (via CLI), OpenClaw (via SSH), local LLMs via LM Studio or Ollama (via HTTP), and llama.cpp with TurboQuant cache support. You can route different subtasks to different models within a single conversation.

The prompt_split tool analyzes your request and suggests which backend handles each piece. You review the plan, adjust models if you want, then Claude executes it.

Status dashboard

A lightweight HTML dashboard at /status shows service health, loaded models, and latency. Auto-refreshes every 30 seconds. Works from your phone.

Getting started

Install globally or use directly:

npx @elvatis_com/elvatis-mcp

Configure in your Claude Desktop config:

{
  "mcpServers": {
    "elvatis": {
      "command": "npx",
      "args": ["-y", "@elvatis_com/elvatis-mcp"],
      "env": {
        "HA_URL": "http://your-homeassistant:8123",
        "HA_TOKEN": "your_token"
      }
    }
  }
}

Restart Claude Desktop and the tools appear automatically.

The project is open source under Apache 2.0. Source and docs at github.com/elvatis/elvatis-mcp. NPM package at npmjs.com/package/@elvatis_com/elvatis-mcp.

Benchmark Results

elvatis-mcp v0.7.1 was benchmarked on an AMD Threadripper 3960X with an AMD Radeon RX 9070 XT Elite (16GB, ROCm). Numbers below are wall-clock time from tool call to result.

Local LLM: GPU vs CPU

The fastest backend is the local GPT-OSS 20B model running on the RX 9070 XT:

Response time: 0.6 - 3.1 seconds depending on prompt length. GPU speedup over CPU: 8.4x.

Sub-agent backend comparison

Average response time per backend across 50 standard tool-call prompts:

Local LLM (GPT-OSS 20B, ROCm): 1.3s avg

Codex: 4.1s avg

Claude: 6.3s avg

Gemini: 34s avg

For simple tasks, local inference is 3-26x faster than cloud backends -- and free after hardware cost.

Full methodology and raw results: BENCHMARKS.md on GitHub.