MeridianMERIDIAN

validate

Validate data quality against taxonomy JSON Schemas — quarantine invalid values, clean the rest.

Check data values against the JSON Schema for their detected (or specified) type. Invalid values are separated into a quarantine file; valid values go to a cleaned output. Use validate as a quality gate before loading data into your warehouse.

Planned redesign

The validate command is being redesigned as a schema-driven quality gate with sidecar output files (.valid.csv, .invalid.csv, .errors.jsonl). The current interface documented here will change. See the validate-command spec for details.

Usage

finetype validate [OPTIONS] --file <FILE>

Options

FlagTypeDefaultDescription
-f, --filepathInput file: NDJSON with value/label fields, or plain text with --label (required)
-l, --labelstringValidate all values against this label (plain text input)
-t, --taxonomypathlabelsTaxonomy file or directory
--strategystringquarantineStrategy for invalid values: quarantine, null, ffill, bfill
-o, --outputstringplainOutput format: plain, json, csv, markdown, arrow
--quarantine-filepathquarantine.ndjsonQuarantine file path
--cleaned-filepathcleaned.ndjsonCleaned output file path

Examples

Validate a column of emails

Create a file emails.txt with one value per line, then validate against the email schema:

$ finetype validate -f emails.txt --label identity.person.email

Values that match the JSON Schema pattern for identity.person.email pass; the rest are quarantined.

Validate NDJSON with mixed types

If you have NDJSON where each line has value and label fields:

{"value": "[email protected]", "label": "identity.person.email"}
{"value": "not-valid", "label": "identity.person.email"}
{"value": "192.168.1.1", "label": "technology.internet.ip_v4"}
$ finetype validate -f mixed.ndjson

Each value is validated against its own label's schema.

Choose a strategy for invalid values

$ finetype validate -f data.ndjson --strategy null
StrategyBehaviour
quarantineWrite invalid rows to a separate quarantine file (default)
nullReplace invalid values with null in cleaned output
ffillForward-fill invalid values from the last valid value
bfillBack-fill invalid values from the next valid value

See also

  • schema — inspect the JSON Schema used for validation
  • profile — detect column types before validating (use --validate for a quick check)
  • load — generate DuckDB SQL after validation passes

On this page