# GoatLLM

> GoatLLM is a VS Code extension that lets developers chat, edit, and run agentic coding tasks against open-weight large language models running entirely on their own machine. It works with MLX (Apple Silicon), Ollama, LM Studio, llama.cpp, vLLM, exo, Hugging Face models, and any OpenAI-compatible HTTP endpoint. No accounts. No cloud. No telemetry.

GoatLLM is created and maintained by Brandon Charleson. The latest stable version is 1.0.1. The extension itself is currently distributed as a free closed-source `.vsix` hosted at https://goatllm.ai/downloads/goatllm-vscode-1.0.1.vsix and is pending VS Code Marketplace publication. The models GoatLLM connects to are open-weight; the extension binary is not currently open source.

GoatLLM operates in three modes: **Chat** (pure conversation), **Agent** (tool calling with approval gates on writes and shell commands), and **Agent (full access)** (auto-approves everything for hands-off operation). Agent tools are `read_file`, `list_directory`, `write_file`, and `run_command`.

## Install

- [Direct download (v1.0.1 .vsix)](https://goatllm.ai/downloads/goatllm-vscode-1.0.1.vsix): the recommended install path until Marketplace publication completes. Drag the file onto VS Code's Extensions panel.
- [Install via CLI](https://goatllm.ai/#install): `code --install-extension goatllm-vscode-1.0.1.vsix` works as an alternative to drag-and-drop.

## Setup guides

- [Ollama](https://goatllm.ai/#setup): `brew install ollama && ollama pull qwen2.5-coder:32b && ollama serve`. Default endpoint `http://localhost:11434/v1`.
- [LM Studio](https://goatllm.ai/#setup): GUI runtime. Download from lmstudio.ai, install a model, start the server. Default endpoint `http://localhost:1234/v1`.
- [MLX (Apple Silicon, Hugging Face models)](https://goatllm.ai/#setup): `pip install -U "git+https://github.com/ml-explore/mlx-lm.git"` then `mlx_lm.server --model mlx-community/Qwen2.5-Coder-32B-Instruct-4bit --port 8013`. Default endpoint `http://localhost:8013/v1`.
- [llama.cpp](https://goatllm.ai/#setup): Build, then `./llama-server -m model.gguf --port 8080`. Default endpoint `http://localhost:8080/v1`.

## Documentation

- [Features overview](https://goatllm.ai/#features): auto-detect, agent mode, full autonomy, hot-swap endpoints, live tok/s, zero telemetry.
- [Modes table](https://goatllm.ai/#docs): the three operating modes — Chat, Agent, Agent (full access) — with tool access and approval policies.
- [Settings reference](https://goatllm.ai/#docs): full `goatllm.*` configuration options. Endpoints, model, temperature, system prompts, command deny list, sudo policy.
- [Remote endpoint setup](https://goatllm.ai/#docs): example `goatllm.endpoints` config for running models on a separate machine over the network.
- [Security model](https://goatllm.ai/#docs): what's blocked unconditionally, what requires approval, network surface, secrets handling in SecretStorage.
- [FAQ](https://goatllm.ai/#docs): hardware requirements, tool-calling support, cloud usage, bug reporting, logs, throughput metrics, airgapped operation.
- [Changelog](https://goatllm.ai/#changelog): v1.0.0 initial release, v1.0.1 UI and metrics polish.

## Compatible models

GoatLLM uses OpenAI-style `tool_choice: auto` for agent modes. Models verified to work include Qwen 2.5-Coder, Llama 3.1+, Gemma 2+, DeepSeek-Coder-V2, Mistral 0.3+, Phi-3.5, and fine-tunes thereof.

## Optional

- [Full agent guide (concatenated)](https://goatllm.ai/llms-full.txt): the complete site content as a single markdown document, suitable for LLM context windows.
- [Author](https://github.com/bcharleson): Brandon Charleson, creator and maintainer.
- [Contact](mailto:b.charleson1@gmail.com): bug reports and feature requests. The source repository is currently private; public issue tracking will open with Marketplace publication.