The inference rules engine automatically enriches document metadata based on file attributes using a declarative JSON Schema-based system with type coercion and runtime values tracking.
When a file is processed, the watcher:
set templates and coerces values to declared typesPOST /metadata API (overrides everything)This creates a flexible, declarative metadata pipeline with strong type guarantees.
v0.5.0 introduces a complete redesign of the inference rules system. The v1 set property has been replaced by declarative JSON Schemas with type coercion, UI hints, and runtime values tracking.
| Aspect | v1 (≤ 0.4.x) | v2 (≥ 0.5.0) |
|---|---|---|
| Metadata definition | set object with template strings |
schema arrays referencing global schemas |
| Type handling | All values are strings | Type coercion to declared type |
| Rule identity | Anonymous | Requires name and description |
| Schema definition | Inline only | Global schemas collection + inline |
| Property metadata | None | uiHint, description, enum |
| Values tracking | None | Runtime values index per rule |
| Matched rules | Not tracked | matched_rules array in payload |
Each inference rule has these fields:
{
"name": "jira-issue",
"description": "Jira issue metadata from JSON exports",
"match": { /* JSON Schema */ },
"schema": [
"base",
"jira-common",
{ "properties": { "status": { "type": "string", "set": "{{json.current.fields.status.name}}" } } }
],
"map": { /* Optional JsonMap transform */ },
"template": "jira-issue"
}
name (required): Unique identifier for the ruledescription (required): Human-readable description of the rule's purposematch: JSON Schema object that file attributes must satisfyschema: Array of schema references (named strings or inline objects), merged left-to-rightmap (optional): JsonMap transformation (inline or named reference)template (optional): Handlebars content template (inline, named ref, or file path)render (optional): Declarative structured renderer configuration (mutually exclusive with template)renderAs (optional): Output file extension override (e.g. "md", "html", "txt"). 1–10 lowercase alphanumeric characters, without dot. Requires template or render.Define reusable schemas in the top-level schemas config:
{
"schemas": {
"base": {
"type": "object",
"properties": {
"domain": {
"type": "string",
"description": "Content domain",
"uiHint": "select"
},
"created": {
"type": "integer",
"description": "Record creation date as unix timestamp (seconds)",
"uiHint": "date"
}
}
},
"jira-common": "schemas/jira-common.json"
}
}
Schema entries can be:
"schemas/base.json")The schema property accepts an array of schema references, merged left-to-right at the property level:
{
"inferenceRules": [
{
"name": "jira-issue",
"description": "Jira issue metadata",
"match": { "..." },
"schema": [
"base",
"jira-common",
{
"properties": {
"domain": { "set": "jira" },
"status": {
"type": "string",
"description": "Current workflow status",
"uiHint": "select",
"set": "{{json.current.fields.status.name}}"
},
"created": { "set": "{{json.current.fields.created}}" }
}
}
]
}
]
}
Merge semantics:
"base") are resolved from global schemas collectiontype, description, uiHint, enum, set from whichever schema last defines themThis pattern promotes DRY: base defines that domain is type: "string" with a description, and each rule's inline tail provides set: "jira" or set: "email".
For array-typed properties, set values from multiple schemas in the merge chain are concatenated rather than replaced. This enables composable metadata: a base schema can define initial array values, and subsequent schemas or rules can append to them.
Example — composable domains:
{
"schemas": {
"base": {
"properties": {
"domains": { "type": "array", "description": "Content domains", "set": ["general"] }
}
}
},
"inferenceRules": [
{
"name": "jira-and-engineering",
"description": "Jira issues in the engineering domain",
"match": { "..." },
"schema": [
"base",
{ "properties": { "domains": { "set": ["jira", "engineering"] } } }
]
}
]
}
The resolved domains value for a matching file would be ["general", "jira", "engineering"] — the base array concatenated with the inline tail array. This only applies to properties declared as type: "array"; scalar properties use standard last-write-wins semantics.
set KeywordThe set keyword within a property schema serves three purposes:
type"set": "jira") from extracted values ("set": "{{json.status}}")Template interpolation:
{
"status": {
"type": "string",
"set": "{{json.current.fields.status.name}}"
},
"domain": {
"type": "string",
"set": "jira"
}
}
Templates use {{path.to.field}} Handlebars syntax to reference the file attributes object. Undefined paths resolve to empty string.
After template interpolation, values are automatically coerced to their declared type:
| Type | Coercion Rules | Examples |
|---|---|---|
string |
No coercion (interpolation already produces strings) | "42" → "42" |
integer |
Parse as integer; empty/invalid → undefined |
"42" → 42, "" → undefined |
number |
Parse as float; empty/invalid → undefined |
"3.14" → 3.14, "" → undefined |
boolean |
"true" → true, "false" → false; else → undefined |
"true" → true, "" → undefined |
array |
Parse JSON array string; already array → passthrough | "[1,2,3]" → [1,2,3], [1,2] → [1,2] |
object |
Parse JSON object string; already object → passthrough | "{\"x\":1}" → {x:1} |
Empty string behavior: Empty strings ("") coerce to undefined for all non-string types. This prevents invalid data from reaching Qdrant.
Example:
{
"created": {
"type": "integer",
"description": "Creation date as unix timestamp",
"set": "{{json.current.fields.created}}"
}
}
If json.current.fields.created is the string "1735689600", coercion converts it to the integer 1735689600 for storage in Qdrant.
uiHint KeywordThe uiHint keyword tells consuming UIs how to render a property for search filtering:
| Value | Renders as | Use with |
|---|---|---|
text |
Free text input | Text fields, substring search |
number |
Numeric input / range slider | Numeric fields with range queries |
date |
Date picker / date range | Integer timestamps (unix seconds) |
select |
Single-value dropdown | Enum fields, known values |
multiselect |
Multi-value dropdown | Array fields with enum-like values |
check |
Boolean toggle / checkbox | Boolean fields |
Properties without uiHint are not displayed in the UI. This is an explicit opt-in: removing a uiHint hides the field; adding one exposes it.
uiHint also serves as intent metadata for LLM consumers. It augments the property description by signaling how the property is meant to be used in queries.
Example:
{
"created": {
"type": "integer",
"description": "Record creation date as unix timestamp (seconds)",
"uiHint": "date",
"set": "{{json.current.fields.created}}"
},
"priority": {
"type": "string",
"description": "Issue priority",
"enum": ["Highest", "High", "Medium", "Low", "Lowest"],
"uiHint": "select",
"set": "{{json.current.fields.priority.name}}"
}
}
uiHint changes take effect immediately on config reload (no reindex needed).
Every property in a resolved (merged) rule schema must have a declared type. If a property appears in an inline tail with only set and was never declared with a type in any named schema in the merge chain, config validation fails on load.
Why: This prevents silent string-defaulting and ensures the resolved schema is always self-describing.
Example - valid:
{
"schemas": {
"base": {
"properties": {
"domain": { "type": "string" }
}
}
},
"inferenceRules": [
{
"name": "example",
"description": "...",
"match": { "..." },
"schema": [
"base",
{ "properties": { "domain": { "set": "jira" } } }
]
}
]
}
Example - invalid:
{
"inferenceRules": [
{
"name": "example",
"description": "...",
"match": { "..." },
"schema": [
{ "properties": { "domain": { "set": "jira" } } } // ERROR: no type
]
}
]
}
The watcher builds an attributes object for each file:
interface FileAttributes {
file: {
path: string; // Normalized path (forward slashes, lowercase drive)
directory: string; // Directory containing the file
filename: string; // File name with extension
extension: string; // Extension including dot (e.g., ".md")
sizeBytes: number; // File size in bytes
modified: string; // ISO-8601 timestamp of last modification
};
frontmatter?: Record<string, unknown>; // YAML frontmatter from .md files
json?: Record<string, unknown>; // Parsed content from .json files
}
Example (for j:/domains/jira/VCN/issue/WEB-123.json):
{
"file": {
"path": "j:/domains/jira/vcn/issue/web-123.json",
"directory": "j:/domains/jira/vcn/issue",
"filename": "web-123.json",
"extension": ".json",
"sizeBytes": 8452,
"modified": "2026-02-24T08:15:00Z"
},
"json": {
"entityKey": "WEB-123",
"current": {
"fields": {
"summary": "Fix login bug",
"status": { "name": "In Progress" },
"created": "1735689600"
}
}
}
}
Rules use standard JSON Schema for matching. The watcher uses ajv with full support for properties, required, type, const, enum, nested objects, arrays, and string patterns.
glob FormatThe watcher registers a custom glob keyword for path matching using picomatch:
{
"match": {
"properties": {
"file": {
"properties": {
"path": { "type": "string", "glob": "j:/domains/jira/**/*.json" }
}
}
}
}
}
Glob syntax:
** — matches any number of directories* — matches any characters within a segment{md,txt} — brace expansion for multiple patternsThis is the only custom format — everything else is pure JSON Schema.
When multiple rules match a file, they are processed in order with last-match-wins semantics at the property level:
{
"inferenceRules": [
{
"name": "default-category",
"description": "Default category for all files",
"match": { "type": "object" },
"schema": [
{ "properties": { "category": { "type": "string", "set": "general" } } }
]
},
{
"name": "important-override",
"description": "Override category for important files",
"match": {
"properties": {
"file": {
"properties": {
"path": { "glob": "**/important/**" }
}
}
}
},
"schema": [
{ "properties": { "category": { "type": "string", "set": "important" } } }
]
}
]
}
Files under **/important/** get category: "important" (second rule wins).
Every embedded point includes a matched_rules field: a keyword array of the inference rule names that matched the file.
Benefits:
{ "key": "matched_rules", "match": { "value": "jira-issue" } } returns all documents processed by that ruleExample Qdrant payload:
{
"file_path": "j:/domains/jira/VCN/issue/WEB-123.json",
"chunk_index": 0,
"matched_rules": ["jira-issue", "json-subject"],
"domain": "jira",
"status": "In Progress",
"created": 1735689600
}
The watcher maintains a values index (values.json in stateDir) tracking distinct values per property per rule. Only trackable primitives (string, number, boolean) are indexed.
Purpose: Populate dropdowns for select and multiselect fields when no enum is declared in the schema.
Storage structure:
{
"jira-issue": {
"status": ["To Do", "In Progress", "In Review", "Done"],
"priority": ["Highest", "High", "Medium", "Low", "Lowest"],
"assignee": ["Jason Williscroft", "Devin Becker"]
},
"slack-message": {
"channel_name": ["general", "project-jeeves-watcher"],
"userName": ["Jason Williscroft", "Jeeves"]
}
}
Update dynamics:
Deletion behavior: When a file is deleted, its chunks are removed from Qdrant but the values index is not updated (no reference counting). Stale values persist until the next full reindex. This is acceptable: a stale value in a dropdown is cosmetic, not a correctness issue.
Queryable via JSONPath:
$.inferenceRules[?(@.name=='jira-issue')].values.status
Returns ["To Do", "In Progress", "In Review", "Done"] when queried through POST /config/query.
Location: {stateDir}/issues.json
A persistent, self-healing ledger of files that failed to embed. Keyed by file path, with one entry per issue per file.
Structure:
{
"j:/domains/jira/VCN/issue/WEB-123.json": [
{
"type": "type_collision",
"property": "created",
"rules": ["jira-issue", "frontmatter-created"],
"types": ["integer", "string"],
"message": "Type collision on 'created': jira-issue declares integer, frontmatter-created declares string",
"timestamp": 1771865063
}
],
"j:/domains/email/archive/msg-456.json": [
{
"type": "interpolation_error",
"property": "author_email",
"rule": "email-archive",
"message": "Failed to resolve ${json.from.email}: 'from' is null",
"timestamp": 1771865100
}
]
}
Issue types:
type_collision — Multiple rules declare the same property with incompatible typesinterpolation_error — set template path doesn't resolve (null, undefined, wrong structure)Behavior:
API: GET /issues returns the issues file contents.
When configWatch.enabled is true, the watcher monitors its config file. On config change:
configWatch.debounceMs (default: 1000ms)configWatch.reindex settingReindex modes:
{
"configWatch": {
"enabled": true,
"debounceMs": 10000,
"reindex": "issues"
}
}
| Mode | Behavior |
|---|---|
"issues" (default) |
Re-process only files in the issues file (cheap, targeted) |
"rules" |
Re-apply inference rules to all files (no re-embedding) |
"full" |
Full reindex of all watched files (use when broad config changes affect already-embedded files) |
Note:
"path"and"prune"are NOT valid for config-watch auto-trigger. They can only be triggered via explicit API calls.
Issues reindex is the default because config changes typically fix issues: a type collision is resolved by editing a rule, and re-processing just the affected file is sufficient.
Full reindex is needed when:
map)In addition to schema-based set values, rules can run a JsonMap transform to derive metadata from the file attributes.
map can be an inline JsonMap object, or a string reference to a named map defined in top-level config mapsmap output overrides schema set output on field conflictmap extracts a path segment{
"name": "extract-project",
"description": "Extract project name from path",
"match": { "type": "object" },
"schema": [
{ "properties": { "domain": { "type": "string", "set": "docs" } } }
],
"map": {
"project": {
"$": [
{ "method": "$.lib.split", "params": ["$.input.file.path", "/"] },
{ "method": "$.lib.slice", "params": ["$[0]", 0, 1] },
{ "method": "$.lib.join", "params": ["$[0]", ""] }
]
}
}
}
For a file path docs/readme.md, this produces:
{ "domain": "docs", "project": "docs" }
See Configuration Reference for details on maps.
Rules can include a template field — a Handlebars template that renders the file's data into embeddable markdown. When a template is present, the rendered output replaces the raw file content for embedding.
Why: Raw JSON from API responses embeds poorly. Templates transform structured data into clean, readable markdown at index time.
See the v0.4.0 section in Inference Rules Guide (legacy) for template details.
Metadata is built in layers with clear precedence:
Rules are evaluated in order. For each matching rule:
set templates are resolved and coercedmap (JsonMap) transformation runs (if present)map output overrides set output on field conflictLater rules override earlier rules on field conflict.
Metadata from .meta.json sidecars (written via POST /metadata API) overrides all inference rule output.
Note: Enrichment metadata is now validated against the resolved schema for the file's matched rules. Invalid metadata is rejected with descriptive errors.
inferred (from rules) → enrichment (from .meta.json) → final payload
↑ wins conflicts
{
"schemas": {
"base": {
"properties": {
"domain": { "type": "string", "description": "Content domain", "uiHint": "select" }
}
}
},
"inferenceRules": [
{
"name": "email-domain",
"description": "Email archive messages",
"match": {
"properties": {
"file": {
"properties": {
"path": { "glob": "j:/domains/email/archive/**" }
}
}
}
},
"schema": [
"base",
{ "properties": { "domain": { "set": "email" } } }
]
},
{
"name": "meetings-domain",
"description": "Meeting transcripts and notes",
"match": {
"properties": {
"file": {
"properties": {
"path": { "glob": "j:/domains/meetings/**" }
}
}
}
},
"schema": [
"base",
{ "properties": { "domain": { "set": "meetings" } } }
]
}
]
}
{
"name": "frontmatter-metadata",
"description": "Extract metadata from YAML frontmatter",
"match": {
"properties": {
"frontmatter": {
"required": ["title"]
}
}
},
"schema": [
{
"properties": {
"title": {
"type": "string",
"description": "Document title",
"uiHint": "text",
"set": "{{frontmatter.title}}"
},
"created": {
"type": "integer",
"description": "Creation date as unix timestamp",
"uiHint": "date",
"set": "{{frontmatter.created}}"
}
}
}
]
}
If frontmatter has created: "1735689600", type coercion converts it to integer 1735689600.
{
"name": "jira-issue",
"description": "Jira issue metadata from JSON exports",
"match": {
"properties": {
"json": {
"required": ["entityKey"]
}
}
},
"schema": [
{
"properties": {
"issue_key": {
"type": "string",
"description": "Jira issue key",
"uiHint": "text",
"set": "{{json.entityKey}}"
},
"status": {
"type": "string",
"description": "Current workflow status",
"uiHint": "select",
"set": "{{json.current.fields.status.name}}"
},
"priority": {
"type": "string",
"description": "Issue priority",
"enum": ["Highest", "High", "Medium", "Low", "Lowest"],
"uiHint": "select",
"set": "{{json.current.fields.priority.name}}"
}
}
}
]
}
{
"name": "project-docs",
"description": "Markdown documentation under projects",
"match": {
"properties": {
"file": {
"properties": {
"path": { "glob": "j:/domains/projects/**" },
"extension": { "const": ".md" }
},
"required": ["path", "extension"]
}
}
},
"schema": [
"base",
{
"properties": {
"domain": { "set": "projects" },
"category": { "type": "string", "set": "documentation" }
}
}
]
}
Use jeeves-watcher validate to check rule syntax:
jeeves-watcher validate --config ./my-config.json
For runtime testing, check logs:
jeeves-watcher start --config ./my-config.json
Query Qdrant to inspect payloads:
curl -X POST http://localhost:1936/search \
-H "Content-Type: application/json" \
-d '{"query": "test", "limit": 1}'
Use POST /config/match to test paths against rules without indexing:
curl -X POST http://localhost:1936/config/match \
-H "Content-Type: application/json" \
-d '{"paths": ["j:/domains/jira/VCN/issue/WEB-123.json"]}'
domain, created, updated belong in base schemaset wiringuiHint deliberately — Every exposed filter field is a conscious decisionGET /issues surfaces problems before they become mysteriesuiHint: "select" with runtime values for dynamic enumsBefore (v0.4.x):
{
"inferenceRules": [
{
"match": { "..." },
"set": {
"domain": "jira",
"status": "${json.current.fields.status.name}"
}
}
]
}
After (v0.5.0+):
{
"schemas": {
"base": {
"properties": {
"domain": { "type": "string", "description": "Content domain" }
}
}
},
"inferenceRules": [
{
"name": "jira-issue",
"description": "Jira issue metadata",
"match": { "..." },
"schema": [
"base",
{
"properties": {
"domain": { "set": "jira" },
"status": {
"type": "string",
"description": "Current workflow status",
"set": "{{json.current.fields.status.name}}"
}
}
}
]
}
]
}
Key migration steps:
name and description to every ruleset object with schema arraytype for every propertyuiHint for search-filterable fieldsjeeves-watcher validaterenderAs)When a rule includes template or render, the renderAs field declares the output content type as a file extension (without dot):
{
"name": "slack-message",
"description": "Slack channel messages",
"match": { "..." },
"schema": ["base", "slack-common"],
"render": { "frontmatter": ["channelName", "userName"], "body": [{ "path": "text", "heading": 2 }] },
"renderAs": "md"
}
Resolution order (used by POST /render):
renderAs from the last matching rule that declares it.md → "md")"txt" for extensionless filesValidation: renderAs must be 1–10 lowercase alphanumeric characters (/^[a-z0-9]{1,10}$/). The config rejects renderAs if neither template nor render is present on the rule.
interface InferenceRule {
name: string; // Required unique identifier
description: string; // Required human-readable description
match: Record<string, unknown>; // JSON Schema object
schema: SchemaReference[]; // Array of named refs and inline objects
map?: Record<string, unknown> | string; // JsonMap definition, named map ref, or file path
template?: string; // Handlebars template (inline, named ref, or file path)
render?: RenderConfig; // Declarative structured renderer (mutually exclusive with template)
renderAs?: string; // Output file extension override (requires template or render)
}
interface SchemaReference {
// Either a named string reference or an inline schema object
}
interface ResolvedProperty {
type?: string; // JSON Schema type
description?: string; // Human-readable description
uiHint?: string; // UI rendering hint
enum?: unknown[]; // Enum values
set?: string | unknown[]; // Interpolation template or array value
}
See Configuration Reference for integration into the main config.