Ways to cut the maintainer's manual work safely. 6 are already in place (a packet completeness checker, consistency audits, build-time staleness reminders, and a promotion dry-run); 4 are designs kept off until they can be done without runtime services or scheduled jobs. The hard line: automation can tell you a new release appears to exist or a packet is incomplete, but it never reads, infers, or promotes a number on its own.
- 6 implemented safe helpers; 4 design-only (not built).
- Availability/completeness/health only — no fetch-and-promote, no jobs, no runtime.
- Pairs with the operator backlog at /operator/backlog.
Implemented (safe, in place)
Commodity-history refresh (FAO + World Bank) implemented
- What it does: Two on-demand commands machine-read official free files into committed snapshots the site reads statically: `npm run commodities:ingest` (FAO Food Price Index CSV) and `npm run commodities:ingest:worldbank` (World Bank Pink Sheet XLSX — oil, gas, wheat, maize, rice, fertilizer price levels).
- Safe because: Both read free, official, public files — no API key, no paid API, no scraping of chart-only values. The World Bank XLSX is read with a small pure-Node unzip (no third-party dependency). Hand-curated conflict observations are untouched; no causation is asserted.
- Reduces manual work: Turns previously PDF/XLSX/manual commodity updates into two one-command, reviewable refreshes — the first genuinely automatable sources.
- Where it lives: scripts/commodities-ingest*.mjs + /commodities/history + /methodology/commodity-prices.
Commodity refresh-currency watch implemented
- What it does: On-demand `npm run commodities:check-refresh` compares the official FAO and World Bank files to the committed snapshots and flags 'possible-new-release' when a newer period/workbook appears available.
- Safe because: Detect-only: it reads only the official files' latest period / 'updated on' date, writes no snapshot, ingests nothing, and promotes no value. Network-only and not part of the build/validation.
- Reduces manual work: Supports 'build it and forget it, but check occasionally' — you only re-run an ingest when the watch says a source moved.
- Where it lives: scripts/commodities-check-refresh.mjs + /operator/source-watch + the operator report.
Source-packet completeness checker implemented
- What it does: A pure function validates that a maintainer-supplied packet has every required field (official URL, publisher, value, unit, period/asOf, where found, caveat, confidence) and reports what's missing.
- Safe because: It validates a packet's shape only — it never fetches, infers, or promotes a value.
- Reduces manual work: Catches an incomplete packet before any dataset edit, so a promotion attempt never half-lands.
- Where it lives: validatePacket() in src/lib/packet-validator.ts + its test.
Operator / data-needs consistency audit implemented
- What it does: Tests assert every workbench item has a next action and a source packet, every future item is marked not-implemented, and every blocked item states why it can't be finished alone.
- Safe because: Build-time assertions over existing records; no data is changed.
- Reduces manual work: Keeps the operator surfaces trustworthy as items change, so nothing silently goes stale.
- Where it lives: operator-consistency test + the operator workbench.
Stale-data reminders (build-time) implemented
- What it does: Freshness/review status is computed against each source's cadence and surfaced (current / due-soon / overdue / stale) on the data-review and coverage surfaces.
- Safe because: Deterministic against the curated reference date; no runtime clock, no fetching.
- Reduces manual work: Tells the maintainer exactly which values to re-check first.
- Where it lives: /data-review, /data-coverage, /freshness.
Promotion dry-run generator implemented
- What it does: Shows what a promotion would change before it is made, so a value edit is reviewed up front.
- Safe because: A preview only — it does not write anything.
- Reduces manual work: Removes guesswork from a manual promotion.
- Where it lives: /methodology/promotion-dry-run.
Design only (not built)
These would reduce manual work but are intentionally not implemented — each is described so the design is on record, not so it runs.
Source-release availability watcher design only
- What it does: A future helper could check whether an official source page advertises a NEWER release than the one on file and emit a 'candidate available' flag — with no value attached.
- Safe because: Availability-only: it would never read, infer, or promote a value, and would not run as a scheduled job on the server. It would be a local/CLI check the maintainer runs.
- Reduces manual work: Saves the maintainer from manually polling each source for a new release.
- Where it lives: Design note here; NOT implemented (the static rules exclude server-side fetching/jobs by default).
Official source-URL health checklist design only
- What it does: A manual or local-only pass that flags dead/redirected official source URLs before they undermine a citation.
- Safe because: No crawling from the deployed site; a maintainer-run check only.
- Reduces manual work: Catches link rot proactively instead of via reader reports.
- Where it lives: Design note here + /source-health; NOT a runtime crawler.
Indexing checklist status tracker design only
- What it does: A static checklist the maintainer ticks off (GSC property, DNS TXT, sitemap submitted, top pages requested) so indexing setup state is visible.
- Safe because: A checklist — Claude never submits anything to a search engine.
- Reduces manual work: Keeps the one-time manual indexing steps from being forgotten or repeated.
- Where it lives: /methodology/indexing.
Future API/provider evaluation checklist design only
- What it does: Records an assessment of a possible future AIS/market-data provider (terms, cost, reuse rights) so a decision is deliberate.
- Safe because: Assesses a provider; ingests nothing. No vendor claims without cited terms.
- Reduces manual work: Front-loads the decision so live-data work, if ever chosen, starts from a record.
- Where it lives: /operator/source-packets (evaluation packets), /methodology/maritime-data, /methodology/live-data.
Guardrails
- No fetch-and-promote: a value is only added by a maintainer-reviewed manual transcription.
- No scheduled jobs, background workers, or runtime services on the server.
- No scraping of unofficial data and no parsing of binary PDF/XLSX into the dataset.
- No paid APIs and no live feeds.
- Automation may detect and report availability/health only — it never asserts a number.
Related
Operator workbench · Operator backlog · Source packets · Refresh harness · Live-data architecture · /operator/automation/data.json