MeridianMERIDIAN

validate

Validate CSV or Parquet against a JSON Schema — check-only, or materialise a typed DuckDB table in one pass.

Validate a CSV or Parquet file against a JSON Schema. By default validate runs check-only and reports how many rows pass. Pass --db and --table and it also materialises the valid rows into a typed DuckDB table — applying each column's transform — alongside a finetype_reject_errors sidecar, all in a single pass.

Use validate as a quality gate before loading data into your warehouse, or as the one-shot CSV-to-typed-table step at the end of the pipeline.

Usage

finetype validate [OPTIONS] <DATA> <SCHEMA>

Arguments

ArgumentDescription
DATAInput CSV or Parquet file
SCHEMAJSON Schema file to validate against (e.g. from profile -f data.csv -o json-schema)

Options

FlagTypeDefaultDescription
--dbpathOutput DuckDB database file (created if absent). When supplied, --table is also required. Omit for check-only mode.
--tablestringTable name to create for valid rows. Required only when --db is supplied.
--appendflagAppend to an existing database/table or a prior reject sidecar. Requires --db.
--lenientflagForce exit code 0 regardless of reject count (does not affect the error exit code 2).
-o, --outputstringplainOutput format for the summary report: plain, json

Exit codes

CodeMeaning
0No rejects — every row passed
1Rejects present
2Error

When --db is used, duckdb must be available on your PATH.

Examples

Check-only

Validate a delivery against a saved schema without writing anything:

$ finetype validate contacts.csv schema.json
Validation Report
════════════════════════════════════════════════════════════
  Input:        contacts.csv
  Schema:       schema.json
  Mode:         check-only (no .db written)

  Total rows:            12
  Valid rows:             9
  Invalid rows:           3
  Rejects:                3
  Grade:             C
════════════════════════════════════════════════════════════

The report tallies total, valid, and invalid rows and assigns a letter Grade. Exit code 1 signals rejects — useful as a CI quality gate:

finetype validate contacts.csv schema.json \
  && echo "Quality gate passed" \
  || echo "Quality gate failed"

Validate and materialise a typed table

Pass --db and --table to gate the data and cast it into a typed DuckDB table in one pass:

$ finetype validate contacts.csv schema.json --db out.db --table contacts
Validation Report
════════════════════════════════════════════════════════════
  Input:        contacts.csv
  Schema:       schema.json
  Output DB:    out.db
  Target table: contacts
  Scan ID:      1

  Total rows:            12
  Valid rows:             9
  Invalid rows:           3
  Rejects:                3
    SEMANTIC_TYPE:        3
    TRANSFORM_FAILED:     0
  Grade:             C
════════════════════════════════════════════════════════════

When a database is written, the report adds the output target, a Scan ID, and a breakdown of the reject kinds. This writes two things to out.db:

  • the contacts table — valid rows only, with each column's TRY-wrapped transform applied
  • a finetype_reject_errors sidecar — engine rejects (error_type='SEMANTIC_TYPE') and cells that passed validation but failed the typed cast (error_type='TRANSFORM_FAILED')

Inspect the rejects:

duckdb out.db -c "SELECT line, column_name, error_type, expected_type FROM finetype_reject_errors ORDER BY line;"
┌──────┬─────────────┬───────────────┬───────────────────────────┐
│ line │ column_name │  error_type   │       expected_type       │
├──────┼─────────────┼───────────────┼───────────────────────────┤
│ 6    │ email       │ SEMANTIC_TYPE │ identity.person.email     │
│ 9    │ amount      │ SEMANTIC_TYPE │ finance.currency.amount   │
│ 10   │ ip_address  │ SEMANTIC_TYPE │ technology.internet.ip_v4 │
└──────┴─────────────┴───────────────┴───────────────────────────┘

Append a later batch

$ finetype validate batch-2.csv schema.json --db out.db --table contacts --append

See also

On this page