Validate a Data Delivery
Use FineType's schema and validate commands to build a repeatable quality gate for incoming data.
Planned feature
The validate command and file-level schema generation described here are planned but not yet shipped. This recipe documents the target workflow so you can plan your pipeline. Check the CLI Reference for current status.
Goal: Save a schema from a known-good data batch, then validate every subsequent delivery against it — a repeatable quality gate you can run manually or in CI.
Prerequisites
| Tool | Purpose |
|---|---|
| FineType | Schema generation and validation |
| A known-good CSV file | The "golden" batch that defines the expected shape |
| New CSV deliveries | Incoming data to validate against the schema |
The problem
You receive data from a partner or upstream system. The first batch looks fine — you build a pipeline around it. Then batch 17 arrives with a new column, dates in a different format, and nulls where there shouldn't be any. Your pipeline breaks at 2 AM.
FineType's validate workflow catches these issues at the gate, before the data enters your pipeline.
Steps
1. Profile and save a schema from the known-good batch
Start with a batch you trust. Run schema to generate a JSON Schema that captures the expected structure:
finetype schema good-batch.csv > delivery-schema.jsonThe schema captures:
- Column names and their expected order
- Semantic type for each column (e.g.,
datetime.date.iso_8601,identity.contact.email) - Nullability — which columns had null values in the good batch
- Value constraints derived from the detected type
Commit delivery-schema.json to your repository. This is your contract.
2. Validate a new delivery
When a new batch arrives, run validate to check it against the saved schema:
finetype validate new-batch.csv delivery-schema.jsonValidating new-batch.csv against delivery-schema.json...
Rows: 10,000
Valid: 9,847 (98.5%)
Invalid: 153 (1.5%)
Output:
new-batch.csv.valid.csv 9,847 rows
new-batch.csv.invalid.csv 153 rows
new-batch.csv.errors.jsonl 153 errors
Summary: FAIL (153 invalid rows)FineType produces three sidecar files alongside the input:
| File | Contents |
|---|---|
new-batch.csv.valid.csv | Rows that passed all schema checks |
new-batch.csv.invalid.csv | Rows that failed one or more checks |
new-batch.csv.errors.jsonl | One JSON object per invalid row with error details |
3. Inspect the errors
Each line in the errors file describes what went wrong:
head -3 new-batch.csv.errors.jsonl{"row": 42, "column": "order_date", "value": "15/01/2024", "expected_type": "datetime.date.iso_8601", "error": "value does not match pattern YYYY-MM-DD"}
{"row": 108, "column": "amount", "value": "N/A", "expected_type": "representation.numeric.decimal", "error": "value is not numeric"}
{"row": 153, "column": "customer_email", "value": "", "expected_type": "identity.contact.email", "error": "null value in non-nullable column"}The errors tell you exactly which row, which column, what the value was, and why it failed. No guesswork.
4. Use summary-only mode for CI
In a CI pipeline, you don't need sidecar files — you just need a pass/fail signal. Use --summary-only:
finetype validate new-batch.csv delivery-schema.json --summary-onlyRows: 10,000 | Valid: 9,847 (98.5%) | Invalid: 153 (1.5%) | FAILExit codes:
- 0 — all rows valid, delivery passes
- 1 — one or more rows invalid, delivery fails
Use this in a CI step to block pipeline runs on bad data:
finetype validate new-batch.csv delivery-schema.json --summary-only \
&& echo "Quality gate passed — loading data" \
|| echo "Quality gate failed — check errors"5. Fix and re-validate
When validation fails, the workflow is:
- Inspect
new-batch.csv.errors.jsonlto understand the issues - Fix the source data (or update the schema if the change is intentional)
- Re-run validation:
finetype validate fixed-batch.csv delivery-schema.jsonValidating fixed-batch.csv against delivery-schema.json...
Rows: 10,000
Valid: 10,000 (100.0%)
Invalid: 0 (0.0%)
Summary: PASSOnce the delivery passes, proceed to loading with finetype load.
6. Update the schema when requirements change
When a legitimate schema change occurs (new column, different format from an upgraded source system), update the contract:
finetype schema updated-batch.csv > delivery-schema.jsonCommit the updated schema. All future validations will use the new contract.
What you learned
finetype schemacaptures the expected structure of a CSV as a JSON Schema filefinetype validatechecks incoming data against a saved schema and produces valid/invalid/error sidecar files--summary-onlymode returns an exit code for CI integration — 0 for pass, 1 for fail- The schema file is your contract: commit it, version it, and update it intentionally
See also
schemacommand reference — schema generation optionsvalidatecommand reference — flags, output formats, and exit codes- Build a Typed DuckDB Pipeline — load validated data into DuckDB with proper types