Agent Code-Test Loop

An AI agent writes code, runs tests, reads failures, fixes, and repeats, on a real server

An AI agent writes code, runs tests, reads failures, fixes code, repeats. Five to twenty iterations on a real server until everything passes.

This is the core use case. It works today with zero caveats, with any MCP-capable agent: Claude Code, Cursor, Cline, Aider. The examples use Claude because that's what we test against; swap in your agent of choice.

With your agent (MCP)

Once gibil's MCP server is wired up (one-time setup), just tell your orchestrator:

Forge a VM with my repo, run the test suite, and fix failures until it's green.
Show me the diff before you destroy the box.

The agent drives the loop entirely through typed MCP tools, with no shell-string escaping:

create_server({ name: "pr-42", repo: "github.com/you/project", ttl: 60 })

// iterate: run tests → read output → patch files → run again
// vm_bash defaults its working dir to /root/project
vm_bash({ server: "pr-42", command: "pnpm test" })   // → { exit_code: 1, stdout: "3 failed, 39 passed" }
vm_read({ server: "pr-42", path: "/root/project/src/parser.ts", offset: 40, limit: 30 })
vm_write({ server: "pr-42", path: "/root/project/src/parser.ts", content: "..." })
vm_bash({ server: "pr-42", command: "pnpm test" })   // → { exit_code: 0, stdout: "42 passed" }

destroy_server({ name: "pr-42" })

For long builds, run them in the background and poll:

vm_bash({ server: "pr-42", command: "pnpm build && pnpm test", background: true })  // → { job_id: "j-abc123" }
vm_job_status({ job_id: "j-abc123" })  // → exit code + output when done

By hand (CLI)

The same loop without an agent driving it:

gibil create --name pr-42 --repo github.com/you/project --ttl 60

gibil run pr-42 "cd /root/project && pnpm test" --json
# → {"stdout": "3 failed, 39 passed", "stderr": "", "exit_code": 1}

# fix code, run again
gibil run pr-42 "cd /root/project && pnpm test" --json
# → {"stdout": "42 passed", "stderr": "", "exit_code": 0}

gibil destroy pr-42

The --json flag returns structured output the agent can parse without regex:

{ "stdout": "3 failed, 39 passed", "stderr": "", "exit_code": 1 }

What the agent gets

A real Linux server with:

Your repo cloned to /root/project
Runtime installed (Node, Python, or Go via .gibil.yml)
Root access, no permission issues
SSH for the full TTL window

Why gibil

Long-lived session: the server stays up for the full TTL, not just one command
Clean state: fresh Ubuntu 24.04, no leftover artifacts from previous runs
Machine-readable: structured output on every tool call and --json on every command
Auto-cleanup: TTL burns the server when the agent is done (or forgets)

MCP mode shines when the agent reads and writes files frequently. vm_write({ path, content }) beats shell-escaping a heredoc. See AI Agent via MCP for the full tool list.

Next steps

Orchestration: the orchestrator + box model behind every recipe
AI Agent via MCP: wire up the MCP server
Run Agents in Parallel: fan this loop out across N branches
CLI: gibil run: full command reference