MeridianMERIDIAN

Validate a Data Delivery

Use FineType's schema and validate commands to build a repeatable quality gate for incoming data.

Planned feature

The validate command and file-level schema generation described here are planned but not yet shipped. This recipe documents the target workflow so you can plan your pipeline. Check the CLI Reference for current status.

Goal: Save a schema from a known-good data batch, then validate every subsequent delivery against it — a repeatable quality gate you can run manually or in CI.

Prerequisites

ToolPurpose
FineTypeSchema generation and validation
A known-good CSV fileThe "golden" batch that defines the expected shape
New CSV deliveriesIncoming data to validate against the schema

The problem

You receive data from a partner or upstream system. The first batch looks fine — you build a pipeline around it. Then batch 17 arrives with a new column, dates in a different format, and nulls where there shouldn't be any. Your pipeline breaks at 2 AM.

FineType's validate workflow catches these issues at the gate, before the data enters your pipeline.

Steps

1. Profile and save a schema from the known-good batch

Start with a batch you trust. Run schema to generate a JSON Schema that captures the expected structure:

finetype schema good-batch.csv > delivery-schema.json

The schema captures:

  • Column names and their expected order
  • Semantic type for each column (e.g., datetime.date.iso_8601, identity.contact.email)
  • Nullability — which columns had null values in the good batch
  • Value constraints derived from the detected type

Commit delivery-schema.json to your repository. This is your contract.

2. Validate a new delivery

When a new batch arrives, run validate to check it against the saved schema:

finetype validate new-batch.csv delivery-schema.json
Validating new-batch.csv against delivery-schema.json...

Rows:      10,000
Valid:      9,847 (98.5%)
Invalid:      153 (1.5%)

Output:
  new-batch.csv.valid.csv     9,847 rows
  new-batch.csv.invalid.csv     153 rows
  new-batch.csv.errors.jsonl    153 errors

Summary: FAIL (153 invalid rows)

FineType produces three sidecar files alongside the input:

FileContents
new-batch.csv.valid.csvRows that passed all schema checks
new-batch.csv.invalid.csvRows that failed one or more checks
new-batch.csv.errors.jsonlOne JSON object per invalid row with error details

3. Inspect the errors

Each line in the errors file describes what went wrong:

head -3 new-batch.csv.errors.jsonl
{"row": 42, "column": "order_date", "value": "15/01/2024", "expected_type": "datetime.date.iso_8601", "error": "value does not match pattern YYYY-MM-DD"}
{"row": 108, "column": "amount", "value": "N/A", "expected_type": "representation.numeric.decimal", "error": "value is not numeric"}
{"row": 153, "column": "customer_email", "value": "", "expected_type": "identity.contact.email", "error": "null value in non-nullable column"}

The errors tell you exactly which row, which column, what the value was, and why it failed. No guesswork.

4. Use summary-only mode for CI

In a CI pipeline, you don't need sidecar files — you just need a pass/fail signal. Use --summary-only:

finetype validate new-batch.csv delivery-schema.json --summary-only
Rows: 10,000 | Valid: 9,847 (98.5%) | Invalid: 153 (1.5%) | FAIL

Exit codes:

  • 0 — all rows valid, delivery passes
  • 1 — one or more rows invalid, delivery fails

Use this in a CI step to block pipeline runs on bad data:

finetype validate new-batch.csv delivery-schema.json --summary-only \
  && echo "Quality gate passed — loading data" \
  || echo "Quality gate failed — check errors"

5. Fix and re-validate

When validation fails, the workflow is:

  1. Inspect new-batch.csv.errors.jsonl to understand the issues
  2. Fix the source data (or update the schema if the change is intentional)
  3. Re-run validation:
finetype validate fixed-batch.csv delivery-schema.json
Validating fixed-batch.csv against delivery-schema.json...

Rows:      10,000
Valid:     10,000 (100.0%)
Invalid:        0 (0.0%)

Summary: PASS

Once the delivery passes, proceed to loading with finetype load.

6. Update the schema when requirements change

When a legitimate schema change occurs (new column, different format from an upgraded source system), update the contract:

finetype schema updated-batch.csv > delivery-schema.json

Commit the updated schema. All future validations will use the new contract.

What you learned

  • finetype schema captures the expected structure of a CSV as a JSON Schema file
  • finetype validate checks incoming data against a saved schema and produces valid/invalid/error sidecar files
  • --summary-only mode returns an exit code for CI integration — 0 for pass, 1 for fail
  • The schema file is your contract: commit it, version it, and update it intentionally

See also

On this page