validate
Validate data quality against taxonomy JSON Schemas — quarantine invalid values, clean the rest.
Check data values against the JSON Schema for their detected (or specified) type. Invalid values are separated into a quarantine file; valid values go to a cleaned output. Use validate as a quality gate before loading data into your warehouse.
Planned redesign
The validate command is being redesigned as a schema-driven quality gate with sidecar output files (.valid.csv, .invalid.csv, .errors.jsonl). The current interface documented here will change. See the validate-command spec for details.
Usage
finetype validate [OPTIONS] --file <FILE>Options
| Flag | Type | Default | Description |
|---|---|---|---|
-f, --file | path | — | Input file: NDJSON with value/label fields, or plain text with --label (required) |
-l, --label | string | — | Validate all values against this label (plain text input) |
-t, --taxonomy | path | labels | Taxonomy file or directory |
--strategy | string | quarantine | Strategy for invalid values: quarantine, null, ffill, bfill |
-o, --output | string | plain | Output format: plain, json, csv, markdown, arrow |
--quarantine-file | path | quarantine.ndjson | Quarantine file path |
--cleaned-file | path | cleaned.ndjson | Cleaned output file path |
Examples
Validate a column of emails
Create a file emails.txt with one value per line, then validate against the email schema:
$ finetype validate -f emails.txt --label identity.person.emailValues that match the JSON Schema pattern for identity.person.email pass; the rest are quarantined.
Validate NDJSON with mixed types
If you have NDJSON where each line has value and label fields:
{"value": "[email protected]", "label": "identity.person.email"}
{"value": "not-valid", "label": "identity.person.email"}
{"value": "192.168.1.1", "label": "technology.internet.ip_v4"}$ finetype validate -f mixed.ndjsonEach value is validated against its own label's schema.
Choose a strategy for invalid values
$ finetype validate -f data.ndjson --strategy null| Strategy | Behaviour |
|---|---|
quarantine | Write invalid rows to a separate quarantine file (default) |
null | Replace invalid values with null in cleaned output |
ffill | Forward-fill invalid values from the last valid value |
bfill | Back-fill invalid values from the next valid value |