Chat & LLM

Show:

Schift exposes two chat surfaces:

POST /v1/chat/completions — OpenAI-compatible LLM proxy for direct model calls.
POST /v1/chat — Bucket-backed RAG chat that retrieves context from a bucket before generating an answer.

Use GET /v1/models to list the models available to your organization through the configured provider keys.

All chat routes require a Schift API key passed as a Bearer token.

Note: Response generation is fail-closed. Your organization must have an explicit provider key configured in provider_configs. Schift does not fall back to a platform-managed key for response generation, and missing keys return 403.

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint. Schift routes the request to the configured provider (OpenAI, Google, Anthropic, and others) and returns the response in OpenAI format.

Request body

Name	Type	Required	Default	Description
`model`	string	Yes	—	Model ID, for example `gpt-4o` or `claude-3-sonnet`.
`messages`	object[]	Yes	—	Chat messages in OpenAI format. Each object has `role` and `content`.
`temperature`	float	No	—	Sampling temperature, typically `0.0` to `2.0`.
`max_tokens`	integer	No	—	Maximum number of tokens to generate.
`top_p`	float	No	—	Nucleus sampling parameter.
`stream`	boolean	No	`false`	Return a Server-Sent Events stream.
`stop`	string[]	No	—	Stop sequences that terminate generation.

Example request

curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCHIFT_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Explain embedding model migration in one paragraph."}
    ]
  }'

Example response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Embedding model migration is the process of moving document representations..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 42,
    "total_tokens": 60
  }
}

Streaming

Set "stream": true to receive Server-Sent Events. Each event contains a chunk of the completion in OpenAI-compatible delta format.

curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCHIFT_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Error examples

// 402 Payment Required
{
  "allowed": false,
  "reason": "quota_exceeded"
}

// 402 Insufficient credits
{
  "error": "insufficient_credits",
  "balance": 0,
  "estimated_cost": 120,
  "estimated_cost_usd": 0.0012
}

// 403 Provider key required
{
  "detail": {
    "error": "PROVIDER_KEY_REQUIRED",
    "provider_access": "missing",
    "message": "No provider key configured for response generation. If nothing was given, the response would not be made."
  }
}

// 403 Plan or credit limit
{
  "detail": "Upgrade your plan to continue"
}

// 502 Provider unavailable
{
  "detail": "LLM provider temporarily unavailable"
}

// 503 Service not configured
{
  "detail": "LLM service not configured"
}

GET /v1/models

List the LLM models available through your organization’s configured provider keys.

Example request

curl -G ${API_BASE_URL:-https://api.schift.io}/v1/models \
  -H "Authorization: Bearer $SCHIFT_API_KEY"

Example response

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai"
    },
    {
      "id": "claude-3-sonnet",
      "object": "model",
      "owned_by": "anthropic"
    }
  ]
}

POST /v1/chat

Bucket-backed RAG chat. Schift searches the requested bucket, assembles retrieval context, and generates an answer grounded in the results.

Note: This endpoint does not accept caller-controlled system prompts. Non-empty system_prompt values return 400. The server assembles RAG instructions and treats retrieved text as untrusted evidence.

Request body

Name	Type	Required	Default	Description
`bucket_id`	string	Yes	—	Bucket to search for context.
`message`	string	Yes	—	User question or prompt. Must be non-empty.
`history`	object[]	No	`[]`	Previous conversation turns. Each object has `role` and `content`.
`model`	string	No	`gemini-2.5-flash-lite`	Model used for generation.
`top_k`	integer	No	`7`	Number of retrieval results to include (`1` to `50`).
`access_mode`	string	No	`auto`	Retrieval access policy: `auto`, `internal`, or `external`. `raw` is reserved for platform-admin diagnostics and is rejected for normal callers.
`stream`	boolean	No	`true`	Stream chunks via SSE.
`system_prompt`	string	No	`null`	Deprecated compatibility field. Non-empty values are rejected.
`temperature`	float	No	—	Sampling temperature.
`max_tokens`	integer	No	—	Maximum output tokens.
`debug`	boolean	No	`false`	Include pipeline debug events in SSE. Only platform-admin callers receive debug output.

Example request

curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCHIFT_API_KEY" \
  -d '{
    "bucket_id": "bucket_123",
    "message": "What changed in Q4?",
    "top_k": 7,
    "access_mode": "auto",
    "stream": false
  }'

Example response

{
  "reply": "Q4 revenue increased after the new product launch.",
  "sources": [
    {
      "id": "doc-42",
      "score": 0.92,
      "text": "Quarterly report excerpt ...",
      "bucket_id": "bucket_123"
    }
  ],
  "model": "gemini-2.5-flash-lite",
  "search_id": "search_abc123",
  "degraded": false,
  "warnings": []
}

Response fields

Name	Type	Description
`reply`	string	Generated answer grounded in retrieved bucket context.
`sources`	object[]	Retrieved context snippets used for grounding.
`sources[].id`	string	Source document or chunk identifier.
`sources[].score`	number	Retrieval score for the source.
`sources[].text`	string	Source text excerpt.
`sources[].bucket_id`	string \| null	Bucket identifier when available.
`model`	string	Model used for generation.
`search_id`	string \| null	Retrieval trace ID for support, replay, or feedback.
`degraded`	boolean	Indicates retrieval or generation used a degraded path.
`warnings`	object[]	Structured retrieval or quality warnings. Empty when none apply.

When stream is true, the response is a stream of SSE events. When debug is accepted for a platform-admin request, diagnostic events may include pipeline_debug; regular callers should treat debug output as unavailable.

Error examples

// 400 Rejected system prompt
{
  "detail": "client-supplied system_prompt is not accepted"
}

// 403 Provider key required
{
  "detail": {
    "error": "PROVIDER_KEY_REQUIRED",
    "provider_access": "missing",
    "message": "No provider key configured for response generation. If nothing was given, the response would not be made."
  }
}

// 400 Raw access mode rejected
{
  "detail": "access_mode 'raw' is not allowed for this caller"
}

// 404 Bucket not found
{
  "detail": "Bucket 'bucket_123' not found"
}

Billing and attribution

For both chat surfaces, Schift records token usage and LLM cost logs. Successful response generation persists provider_source:

provider_source = "byok" when an organization-configured provider key is used.

Chat completions are billed per token. A pre-flight cost estimate is performed before each request to prevent overspending, and credits are deducted for non-BYOK platform usage. RAG chat usage is recorded through the same billing paths.

When to use each endpoint

Goal	Endpoint
Generic OpenAI-compatible LLM call without retrieval	`POST /v1/chat/completions`
Answer generation grounded in a Schift bucket	`POST /v1/chat`
Retrieve bucket context and citations only	`POST /v2/buckets/\{bucket_id\}/search`

Chat & LLM

POST /v1/chat/completions

Request body

Example request

Example response

Streaming

Error examples

GET /v1/models

Example request

Example response

POST /v1/chat

Request body

Example request

Example response

Response fields

Error examples

Billing and attribution

When to use each endpoint

See also