validate
Validate CSV or Parquet against a JSON Schema — check-only, or materialise a typed DuckDB table in one pass.
Validate a CSV or Parquet file against a JSON Schema. By default validate runs check-only and reports how many rows pass. Pass --db and --table and it also materialises the valid rows into a typed DuckDB table — applying each column's transform — alongside a finetype_reject_errors sidecar, all in a single pass.
Use validate as a quality gate before loading data into your warehouse, or as the one-shot CSV-to-typed-table step at the end of the pipeline.
Usage
finetype validate [OPTIONS] <DATA> <SCHEMA>Arguments
| Argument | Description |
|---|---|
DATA | Input CSV or Parquet file |
SCHEMA | JSON Schema file to validate against (e.g. from profile -f data.csv -o json-schema) |
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--db | path | — | Output DuckDB database file (created if absent). When supplied, --table is also required. Omit for check-only mode. |
--table | string | — | Table name to create for valid rows. Required only when --db is supplied. |
--append | flag | — | Append to an existing database/table or a prior reject sidecar. Requires --db. |
--lenient | flag | — | Force exit code 0 regardless of reject count (does not affect the error exit code 2). |
-o, --output | string | plain | Output format for the summary report: plain, json |
Exit codes
| Code | Meaning |
|---|---|
0 | No rejects — every row passed |
1 | Rejects present |
2 | Error |
When --db is used, duckdb must be available on your PATH.
Examples
Check-only
Validate a delivery against a saved schema without writing anything:
$ finetype validate contacts.csv schema.json
Validation Report
════════════════════════════════════════════════════════════
Input: contacts.csv
Schema: schema.json
Mode: check-only (no .db written)
Total rows: 12
Valid rows: 9
Invalid rows: 3
Rejects: 3
Grade: C
════════════════════════════════════════════════════════════The report tallies total, valid, and invalid rows and assigns a letter Grade. Exit code 1 signals rejects — useful as a CI quality gate:
finetype validate contacts.csv schema.json \
&& echo "Quality gate passed" \
|| echo "Quality gate failed"Validate and materialise a typed table
Pass --db and --table to gate the data and cast it into a typed DuckDB table in one pass:
$ finetype validate contacts.csv schema.json --db out.db --table contacts
Validation Report
════════════════════════════════════════════════════════════
Input: contacts.csv
Schema: schema.json
Output DB: out.db
Target table: contacts
Scan ID: 1
Total rows: 12
Valid rows: 9
Invalid rows: 3
Rejects: 3
SEMANTIC_TYPE: 3
TRANSFORM_FAILED: 0
Grade: C
════════════════════════════════════════════════════════════When a database is written, the report adds the output target, a Scan ID, and a breakdown of the reject kinds. This writes two things to out.db:
- the
contactstable — valid rows only, with each column's TRY-wrapped transform applied - a
finetype_reject_errorssidecar — engine rejects (error_type='SEMANTIC_TYPE') and cells that passed validation but failed the typed cast (error_type='TRANSFORM_FAILED')
Inspect the rejects:
duckdb out.db -c "SELECT line, column_name, error_type, expected_type FROM finetype_reject_errors ORDER BY line;"┌──────┬─────────────┬───────────────┬───────────────────────────┐
│ line │ column_name │ error_type │ expected_type │
├──────┼─────────────┼───────────────┼───────────────────────────┤
│ 6 │ email │ SEMANTIC_TYPE │ identity.person.email │
│ 9 │ amount │ SEMANTIC_TYPE │ finance.currency.amount │
│ 10 │ ip_address │ SEMANTIC_TYPE │ technology.internet.ip_v4 │
└──────┴─────────────┴───────────────┴───────────────────────────┘Append a later batch
$ finetype validate batch-2.csv schema.json --db out.db --table contacts --appendSee also
profile— detect column types and export the JSON Schema to validate againsttaxonomy— browse the semantic type taxonomy- Validate a Data Delivery — a repeatable quality-gate recipe