profile
Profile a CSV file — detect the semantic type of every column using column-mode inference.
Scan a CSV file and detect the semantic type of every column. profile is the fastest way to understand what your data contains — run it before writing any queries.
Usage
finetype profile [OPTIONS] --file <FILE>Options
| Flag | Type | Default | Description |
|---|---|---|---|
-f, --file | path | — | Input CSV file (single-file mode). Mutually exclusive with --files. |
--files | path | — | File listing input paths, one per line (batch mode). Requires --out-dir. |
--out-dir | path | — | Output directory for batch mode. One output per input is written as <out_dir>/<stem>.<ext>. |
-o, --output | string | plain | Output format: plain, json, csv, markdown, arrow, json-schema |
--sample-size | integer | 100 | Maximum values to sample per column |
--delimiter | character | auto-detect | CSV delimiter character |
--no-header-hint | flag | — | Disable column name header hints |
--enum-threshold | integer | 32 | Cardinality threshold for ENUM columns (0 disables ENUM, shows VARCHAR) |
--stats | flag | — | Attach observed-data constraints to JSON Schema output (minLength/maxLength, minimum/maximum, enum, x-finetype-null-rate, x-finetype-cardinality). Requires -o json-schema. |
-v, --verbose | flag | — | Show additional detail and enable pipeline tracing |
Examples
Profile a CSV file
$ finetype profile -f contacts.csv
FineType Column Profile — "contacts.csv" (12 rows, 6 columns)
════════════════════════════════════════════════════════════════════════════════
COLUMN TYPE BROAD CONF
──────────────────────────────────────────────────────────────────────────────
id representation.identifier.increment BIGINT 97.6% [numeric_sequential_detection]
name identity.person.full_name VARCHAR 98.2%
email identity.person.email VARCHAR 100.0%
created_at datetime.timestamp.iso_8601 TIMESTAMP 99.1%
ip_address technology.internet.ip_v4 VARCHAR 100.0% [ipv4_detection]
amount finance.currency.amount DECIMAL 99.9% [header_hint_cross_domain:amount]
6/6 columns typed, 12 rows analyzedThe bracketed tokens are sense hints — the detection strategy that settled each column. numeric_sequential_detection recognised the running id, ipv4_detection matched the address pattern, and header_hint_cross_domain:amount used the column header to land on a currency amount.
Profile with JSON output
$ finetype profile -f contacts.csv -o jsonJSON output is an object with a columns array. Each entry carries the semantic type, the broad_type (DuckDB storage type), the confidence, null counts, and the transform expression used to cast the column:
{
"columns": [
{
"broad_type": "BIGINT",
"column": "id",
"confidence": 0.9756258726119995,
"disambiguation_applied": true,
"disambiguation_rule": "numeric_sequential_detection",
"is_generic": true,
"non_null": 12,
"null": 0,
"samples_used": 12,
"transform": "CAST({col} AS BIGINT)",
"type": "representation.identifier.increment"
}
]
}Pipe to jq to pull out just the types:
$ finetype profile -f contacts.csv -o json | jq '.columns[].type'Export a JSON Schema for the whole file
$ finetype profile -f contacts.csv -o json-schema > schema.jsonThis emits a machine-readable JSON Schema describing every column — the contract you pass to validate. Add --stats to attach observed-data constraints (length/range bounds, enum values, null rate, cardinality):
$ finetype profile -f contacts.csv -o json-schema --stats > schema.jsonHow it works
- Sample — reads up to
--sample-sizevalues from each column (default: 100). - Classify — runs column-mode
inferon each sample, using column names as header hints (unless--no-header-hintis set). - Report — outputs the detected type, broad DuckDB type, and confidence for every column.
The broad type column (BIGINT, VARCHAR, TIMESTAMP, DECIMAL) tells you what DuckDB type each column can safely cast to. Export the schema with -o json-schema, then validate the data against it — pass --db/--table to materialise a typed DuckDB table in the same pass.
See also
validate— validate against the exported schema and optionally materialise a typed tableinfer— classify individual valuestaxonomy— browse the semantic type taxonomy- Quick Start — full walkthrough from install to first profile