Skip to content

Buckets

A bucket is Schift’s public knowledge storage surface. It holds a set of documents, their extracted text, embeddings, and search index, and exposes a single endpoint for answer-ready retrieval.

You create a bucket, upload files, wait for indexing to finish, then call search. Schift manages the embedding model, vector backend, chunking, OCR, reranking, and citation formatting so you do not have to wire those pieces together.

Each bucket represents a managed collection with the following:

  • Documents — PDFs, Markdown, plain text, Office files, images, and other supported uploads.
  • Extracted chunks — text segments produced by parsing and OCR.
  • Embeddings — dense vector representations generated with the bucket’s configured embedding model.
  • Metadata — user-defined key-value pairs attached to documents for filtering and access control.
  • Search index — the vector and lexical structures used by the managed search pipeline.

Buckets are isolated by organization. A bucket name must not start with the reserved __schift_ prefix, which is reserved for internal system collections.

When you create a bucket, Schift auto-configures the collection:

  • It selects a default text embedding model and dimension.
  • It chooses a vector backend, typically the engine backend.
  • It creates the underlying vector table.

This means a new bucket is ready to receive documents immediately after creation. You do not need to configure embedding endpoints or vector databases yourself.

Document processing is asynchronous. When you upload files, Schift returns job IDs right away and then performs extraction, chunking, embedding, and indexing in the background. You can poll GET /v2/buckets/{bucket_id}/search/status to check when the bucket is ready to answer queries.

Use POST /v2/buckets with a name and optional description. The response includes the bucket ID, dimension, model, backend, and counts.

Use POST /v2/buckets/{bucket_id}/documents to upload one or more files. Supported options include OCR strategy, chunk size, chunk overlap, and document metadata. Each upload returns background jobs that you can track through the Jobs API or the search readiness endpoint.

Note: There are per-request limits on file count and total batch size. Large uploads should be split into smaller batches.

Before relying on a bucket for answers, call GET /v2/buckets/{bucket_id}/search/status. A ready status means all pending indexing jobs have completed and the bucket can serve search requests.

Use POST /v2/buckets/{bucket_id}/search to run the managed knowledge-search pipeline. The request accepts a query, top-k value, context budget, metadata filters, and reranking options. The response contains a paste-ready context block and citations pointing back to the source documents.

You can list, inspect, update metadata, and delete documents through the /v2/buckets/{bucket_id}/documents endpoints. Deleting a document is also asynchronous and returns a job ID.

Use PATCH /v2/buckets/{bucket_id} to change mutable fields such as the name, description, and metadata. Use DELETE /v2/buckets/{bucket_id} to queue a bucket for deletion. Public buckets are read-only and cannot be modified or deleted.

Documents can carry user-defined metadata that is used for filtering during search and for organizing content. Buckets also support privacy and access-policy settings that control how documents are retrieved and exposed.

Reserved metadata keys, including server-stamped access policy keys, cannot be set by callers. When updating metadata, Schift sanitizes and validates the values to keep the bucket in a consistent state.

The public product API is v2. New integrations should use the /v2/buckets/* routes described above.

The older /v1/buckets/*, /v1/query, and /v1/collections/*/search routes remain as compatibility surfaces for existing clients. They are not recommended for new integrations and some newer features, such as the managed v2 search pipeline, are only available through v2.