Skip to content

Metadata

Show:

Document metadata drives retrieval filtering, access control, and citation generation in Schift. You can update metadata on a single document or in bulk, and you can query the server for the current reserved-key registry and validation limits.

Note: Metadata values are stored as strings. Booleans become "true" or "false", and numbers are coerced to their string representation before persistence.

User metadata is arbitrary scalar key/value data used for filtering and faceting. It is validated by server.validation.metadata on every write.

RuleLimit
Key charactersA-Z, a-z, 0-9, _, ., -
Value typesstring, number, boolean, or null
Max JSON payload4 KB
Max keys32
Max key length64 characters
Max value length512 characters
Control charactersnot allowed

Reserved keys are owned by the ingestion, indexing, scoring, and graph pipelines. User payloads must not set them.

GroupKeys
Identitychunk_id, document_id, doc_id, bucket_id, ingest_job_id
Sources3_chunk, source_path, file_name, file_type, source_kind, source_connection_id, source_row_id
Source rowsource_schema, source_table, source_pk
Chunkchunk_index, locator, text, modality, embed_model
Scoringvector_score, bm25_score, rrf_score, rerank_score, hit_score, hit_boost
Graph / search_graph_injected, graph_expanded, semantic_registry_boost, semantic_registry_terms, semantic_registry_attachments, event_time

The schift. prefix is system-owned. Vector-source materialization uses keys such as schift.vector_source_id, schift.source_schema, schift.source_table, and schift.source_pk.

These keys are controlled vocabulary. Clients may request them only through controlled ingest or metadata-management APIs, where values are clamped by bucket policy and caller auth level.

KeyTypeNotes
privacy_levelinteger string 1..10Uploader requests are capped by caller auth level.
internal_accessibleboolean stringServer-stamped; clients cannot set it.
public_accessibleboolean stringServer-stamped; external access also clamps privacy level.
classificationstringinternal, public, restricted, or confidential.
review_statusstringpending, approved, or rejected.
owner_departmentstringServer-stamped uploader/member department.
scopestringDepartment or common retrieval scope.
uploaded_by_user_idstringServer-stamped uploader id.

Note: internal_accessible, owner_department, and uploaded_by_user_id are never caller-editable, even on metadata-management surfaces.

The bulk metadata endpoint accepts a restricted SQL-like statement. It is not raw SQL; it is parsed and mapped to the document metadata API.

Supported operations:

  • SELECT documents WHERE ... — preview matching documents without making changes.
  • UPDATE documents SET ... WHERE ... — update metadata.
  • SOFTDELETE FROM documents WHERE ... — disable search and delete indexed vectors.
  • HARDDELETE FROM documents WHERE ... — queue a hard-delete job.

DELETE FROM documents ... is intentionally unsupported because it is ambiguous.

SELECT documents WHERE privacy_level = 3 LIMIT 50
UPDATE documents SET privacy_level = 4, scope = 'sales' WHERE privacy_level = 3
SOFTDELETE FROM documents WHERE review_status = 'rejected'
HARDDELETE FROM documents WHERE review_status = 'rejected'

PATCH /v1/buckets/{bucket_id}/documents/{document_id}/metadata

Section titled “PATCH /v1/buckets/{bucket_id}/documents/{document_id}/metadata”

Update the metadata for a single document. The endpoint clamps access-policy fields according to bucket policy and the caller’s auth level, and optionally deletes the document’s indexed vectors and queues a reprocessing job.

  • API key callers need the buckets:manage scope.
  • JWT callers need an org admin, owner, org_admin, or platform_admin role.
  • The caller’s auth_level must be greater than or equal to the document’s current privacy_level.
ParameterTypeDescription
bucket_idstringBucket identifier.
document_idstringDocument identifier.
FieldTypeRequiredDescription
metadataobjectNoUser metadata keys and values to merge.
public_accessiblebooleanNoMake the document publicly accessible.
privacy_levelintegerNoPrivacy level from 1 to 10.
classificationstringNointernal, public, restricted, or confidential.
review_statusstringNopending, approved, or rejected.
reindexbooleanNoDelete indexed vectors and queue a reprocessing job. Defaults to true.
{
"metadata": {
"department": "sales",
"region": "apac"
},
"privacy_level": 4,
"classification": "internal",
"review_status": "approved",
"reindex": true
}
{
"id": "doc_01j8x9q2mvn9q",
"bucket_id": "bucket_01j8x9q2mvk8r",
"collection_id": "bucket_01j8x9q2mvk8r",
"metadata": {
"department": "sales",
"region": "apac",
"privacy_level": "4",
"classification": "internal",
"review_status": "approved"
},
"reindex_queued": true,
"reindex_job_id": "job_01j8x9q2mvn9s",
"indexed_vectors_deleted": 12,
"warnings": []
}
StatusMeaningExample response body
400Bad request{ "detail": "metadata key 'chunk_id' is reserved by the system" }
403Forbidden{ "detail": "Requires admin role to manage document metadata" }
403Insufficient auth level{ "detail": "Insufficient auth_level for this document" }
404Not found{ "detail": "Bucket not found" } or { "detail": "Document not found" }

PATCH /v1/buckets/{bucket_id}/documents/metadata/bulk

Section titled “PATCH /v1/buckets/{bucket_id}/documents/metadata/bulk”

Edit many documents at once using exact-match metadata predicates or a statement string. The endpoint matches documents, applies updates, optionally reindexes or disables them, and supports dry-run previews.

  • SELECT previews do not require the metadata-management role.
  • All mutating operations require the same authorization as the single-document endpoint.
  • HARDDELETE additionally requires an org admin user session and confirm = "HARDDELETE \{bucket_id\}".
FieldTypeRequiredDescription
statementstringNoSQL-like statement (max 4,000 characters). Overrides individual fields when provided.
confirmstringNoRequired for non-dry-run HARDDELETE: "HARDDELETE \{bucket_id\}".
whereobjectNoExact-match metadata filter.
metadataobjectNoUser metadata to merge.
public_accessiblebooleanNoUpdate public accessibility.
privacy_levelintegerNoUpdate privacy level (1..10).
classificationstringNoUpdate classification.
review_statusstringNoUpdate review status.
searchablebooleanNofalse disables search and deletes vectors; true leaves search enabled.
reindexbooleanNoQueue a reprocessing job for matched documents. Defaults to true.
dry_runbooleanNoReturn the matched documents without applying changes. Defaults to false.
limitintegerNoMaximum documents to process (1..2000). Defaults to 500.

Update by predicate:

{
"where": { "privacy_level": 3 },
"privacy_level": 4,
"scope": "sales",
"reindex": false
}

Preview with a statement:

{
"statement": "SELECT documents WHERE privacy_level = 3 LIMIT 50",
"dry_run": true
}

Queue a soft delete:

{
"statement": "SOFTDELETE FROM documents WHERE review_status = 'rejected'"
}

Queue a hard delete:

{
"statement": "HARDDELETE FROM documents WHERE review_status = 'rejected'",
"confirm": "HARDDELETE bucket_01j8x9q2mvk8r"
}
{
"bucket_id": "bucket_01j8x9q2mvk8r",
"matched": 12,
"updated": 12,
"skipped": 0,
"reindex_queued": 12,
"indexed_vectors_deleted": 12,
"dry_run": false,
"items": [
{
"id": "doc_01j8x9q2mvn9q",
"metadata": {
"privacy_level": "4",
"scope": "sales"
},
"searchable": true,
"reindex_job_id": "job_01j8x9q2mvn9s",
"indexed_vectors_deleted": 1,
"warnings": []
}
],
"warnings": []
}

For a non-dry-run HARDDELETE, the response status is 202 and includes status: "queued", job_id, and delete_requested_at.

StatusMeaningExample response body
400Bad request{ "detail": "HARDDELETE requires confirm='HARDDELETE bucket_01j8x9q2mvk8r'" }
403Forbidden{ "detail": "API key missing required scope: buckets:manage" }
403Hard delete forbidden{ "detail": "HARDDELETE requires an org admin user session" }
404Bucket not found{ "detail": "Bucket not found" }

Return the server-owned metadata vocabulary, validation limits, and supported statement operations.

{
"pipeline_reserved": [
"bm25_score",
"bucket_id",
"chunk_id",
...
],
"reserved_prefixes": [
"schift."
],
"access_policy": [
"classification",
"internal_accessible",
"owner_department",
"privacy_level",
"public_accessible",
"review_status",
"scope",
"uploaded_by_user_id"
],
"document_state": [
"deleted",
"disabled",
"searchable",
"status"
],
"knowledge_search": {
"citation_metadata": [
"asset_id",
"chunk_hash",
...
],
"system_filterable": [
"bucket_id",
"chunk_id",
...
],
"user_filterable": "any validated user metadata key outside reserved keys, reserved prefixes, and access-policy keys"
},
"limits": {
"json_bytes": 4096,
"keys": 32,
"key_length": 64,
"value_length": 512
},
"statement_operations": [
"SELECT",
"UPDATE",
"SOFTDELETE",
"HARDDELETE"
]
}