ドキュメント

Chat & LLM

Two chat endpoints: RAG chat that searches your buckets and answers with sources, and an OpenAI-compatible completions proxy for direct LLM access.

RAG Chat

POST /v1/chat searches a bucket for relevant context, then generates an answer with source citations. Supports streaming.

bashcurl -X POST https://api.schift.io/v1/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCHIFT_API_KEY" \
  -d '{
    "bucket_id": "abc123",
    "message": "What are the key findings?",
    "top_k": 5,
    "stream": true
  }'
FieldTypeDefaultDescription
bucket_idstringBucket to search for context
messagestringUser question
historyChatMessage[][]Previous conversation turns (max 10)
modelstringgpt-4.1-nanoLLM model for generation
top_kinteger5Number of sources to retrieve
streambooleantrueEnable SSE streaming
system_promptstringCustom system prompt (overrides default)
temperaturefloatSampling temperature
max_tokensintegerMax output tokens

Non-streaming response includes reply, sources (with id, score, text), and model. Streaming emits SSE events: sources, then chunk deltas, then done.

Chat Completions (OpenAI-compatible)

POST /v1/chat/completions is a drop-in replacement for the OpenAI chat API. It proxies to multiple LLM providers through a single endpoint.

bashcurl -X POST https://api.schift.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCHIFT_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "Explain RAG in one paragraph."}]
  }'
FieldTypeDefaultDescription
modelstringLLM model ID (e.g., gpt-4.1, claude-sonnet-4-6)
messagesobject[]OpenAI-format messages
temperaturefloatSampling temperature
max_tokensintegerMax output tokens
streambooleanfalseEnable SSE streaming
stopstring[]Stop sequences

Returns the standard OpenAI response format. Use GET /v1/models to list all available LLM models.

When to use which

Use /v1/chat when you want Schift to search your documents and answer with sources. Use /v1/chat/completions when you want a plain LLM call without retrieval — same format as OpenAI, works with any OpenAI-compatible client library.