Many authoritative sources publish values only in PDFs or spreadsheets. The only sanctioned way to use them is for a human maintainer to read the exact value and record the 15 fields below — 13 required — with a mandatory second pass. No code parses a binary file; there is no scraping and no automation. If a value cannot be read exactly and directly, it is not added.
- 15 checklist fields (13 required); second pass mandatory.
- Binary/PDF/XLSX values are read by a human, not scraped or parsed by code.
- Machine-readable at /methodology/manual-transcription/data.json.
Checklist
| Field | Requirement | Required |
|---|---|---|
sourceUrl | The canonical URL of the source (page or downloadable file). | ✓ |
publisher | The publishing body (e.g. SIPRI, World Bank, European Commission). | ✓ |
reportTitle | The exact report or file title. | ✓ |
location | Page / sheet / table / cell reference, when the value is in a document. | — |
displayedValue | The exact value as displayed by the source (no rounding beyond the source). | ✓ |
unit | The unit exactly as the source states it. | ✓ |
period | The reporting period / asOf date the value refers to. | ✓ |
valueFormat | How the value was read: text | table | cell | direct-download. | ✓ |
secondPass | A second, independent re-read confirming the transcription. | ✓ |
sourceId | The Source id this value attaches to (existing or new). | ✓ |
confidence | high | medium | low, per source authority and clarity. | ✓ |
caveat | The applicable caveat(s) — not real-time, associative not causal, etc. | ✓ |
acceptableReason | Why the value is acceptable despite the source format (e.g. exact cell directly read). | ✓ |
noAutomationReason | Confirmation that the value was read manually — no scraping or automated parsing. | ✓ |
fileNote | A note on the file/screenshot if relevant (no committed screenshots unless already project convention). | — |
Refresh harness: /methodology/refresh-harness · source hierarchy: /methodology/source-hierarchy · machine-readable: /methodology/manual-transcription/data.json.