> ## Documentation Index
> Fetch the complete documentation index at: https://gomodel-docs-benchmark-writeup-and-tooling.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# vLLM

> Route OpenAI-compatible GoModel requests to one or more vLLM servers, including slash-shaped Hugging Face model IDs.

GoModel talks to vLLM through its OpenAI-compatible HTTP server. Hugging Face
model IDs with slashes (e.g. `meta-llama/Llama-3.1-8B-Instruct`) work — GoModel
splits provider-qualified selectors on the first slash only.

Start vLLM first:

```bash theme={null}
vllm serve meta-llama/Llama-3.1-8B-Instruct
# add --api-key token-abc123 if you want vLLM to require bearer auth
```

## Configure

```bash theme={null}
VLLM_BASE_URL=http://host.docker.internal:8000/v1   # include /v1
# VLLM_API_KEY=token-abc123                         # only if vLLM was started with --api-key
GOMODEL_MASTER_KEY=change-me
```

<Note>
  These examples assume GoModel runs in Docker and vLLM is on the host at
  `localhost:8000` — hence `host.docker.internal`. If both run in the same
  Docker network, use the vLLM service name. If GoModel runs on the host
  directly, use `http://localhost:8000/v1`.
</Note>

## Run GoModel

<CodeGroup>
  ```bash Docker (.env file) theme={null}
  docker run --rm -p 8080:8080 --env-file .env enterpilot/gomodel
  ```

  ```bash Docker (inline -e) theme={null}
  docker run --rm -p 8080:8080 \
    -e GOMODEL_MASTER_KEY="change-me" \
    -e VLLM_BASE_URL="http://host.docker.internal:8000/v1" \
    enterpilot/gomodel
  ```

  ```bash Binary (make build) theme={null}
  make build
  ./bin/gomodel
  ```
</CodeGroup>

## Verify

```bash theme={null}
curl -s http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vllm/meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Reply with exactly ok."}]
  }'
```

`GET /v1/models` returns vLLM's model IDs prefixed by provider name, e.g.
`vllm/meta-llama/Llama-3.1-8B-Instruct`.

## Multiple vLLM instances

Use suffixed env vars to register more than one without YAML:

```bash theme={null}
VLLM_BASE_URL=http://host.docker.internal:8000/v1
VLLM_TEST_BASE_URL=http://host.docker.internal:8001/v1
```

`VLLM_BASE_URL` registers `vllm`. `VLLM_TEST_BASE_URL` registers `vllm-test`
(suffix is lowercased, underscores become hyphens).

## vLLM passthrough

Passthrough is enabled by default. Use it for vLLM-specific endpoints such as
`/tokenize`, `/detokenize`, `/pooling`, `/rerank`:

```bash theme={null}
curl -s http://localhost:8080/p/vllm/tokenize \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "prompt": "Hello"}'
```

GoModel strips client auth headers before forwarding and applies `VLLM_API_KEY`
to upstream requests when configured.

<Note>
  Passthrough routes are provider-type scoped at `/p/vllm/...`. To target one
  named instance in a multi-vLLM setup, use translated `/v1/...` requests with
  provider-qualified model IDs (e.g. `vllm-test/meta-llama/Llama-3.1-8B-Instruct`).
</Note>

## Not yet integrated

* Native vLLM batch APIs.
* OpenAI-compatible files lifecycle.
* Responses lifecycle utility endpoints.