profile
Profile a CSV file — detect the semantic type of every column using column-mode inference.
Scan a CSV file and detect the semantic type of every column. profile is the fastest way to understand what your data contains — run it before writing any queries.
Usage
finetype profile [OPTIONS] --file <FILE>Options
| Flag | Type | Default | Description |
|---|---|---|---|
-f, --file | path | — | Input CSV file (required) |
-m, --model | path | models/default | Model directory |
-o, --output | string | plain | Output format: plain, json, csv, markdown, arrow |
--sample-size | integer | 100 | Maximum values to sample per column |
--delimiter | character | auto-detect | CSV delimiter character |
--no-header-hint | flag | — | Disable column name header hints |
--model-type | string | char-cnn | Model type: transformer, char-cnn, tiered |
--sharp-only | flag | — | Disable Sense classifier (Sharpen-only pipeline) |
--validate | flag | — | Run JSON Schema validation after classification |
--enum-threshold | integer | 50 | Cardinality threshold for ENUM columns |
--verbose | flag | — | Show additional detail (JSON output) |
Examples
Profile a CSV file
$ finetype profile -f contacts.csv
FineType Column Profile — "contacts.csv" (12 rows, 6 columns)
════════════════════════════════════════════════════════════════════════════════
COLUMN TYPE BROAD CONF
──────────────────────────────────────────────────────────────────────────────
id representation.identifier.increment BIGINT 50.0%
name identity.person.full_name VARCHAR 60.0%
email identity.person.email VARCHAR 93.3%
created_at datetime.timestamp.iso_8601 TIMESTAMP 93.3%
ip_address technology.internet.ip_v4 VARCHAR 100.0%
amount finance.currency.amount DECIMAL 60.0%
6/6 columns typed, 12 rows analyzedProfile with JSON output
$ finetype profile -f contacts.csv -o jsonJSON output includes the full type key, broad type, confidence, and sample values for each column. Pipe to jq for further processing:
$ finetype profile -f contacts.csv -o json | jq '.[].type_key'Profile with validation
$ finetype profile -f contacts.csv --validateAdding --validate runs each detected column's values against the corresponding JSON Schema and reports the pass rate. This is a quick way to check data quality without a separate validate step.
How it works
- Sample — reads up to
--sample-sizevalues from each column (default: 100). - Classify — runs column-mode
inferon each sample, using column names as header hints (unless--no-header-hintis set). - Report — outputs the detected type, broad DuckDB type, and confidence for every column.
The broad type column (BIGINT, VARCHAR, TIMESTAMP, DECIMAL) tells you what DuckDB type each column can safely cast to. Use load to generate the full CREATE TABLE AS statement.
See also
load— generate a DuckDBCREATE TABLE ASfrom the profile resultsinfer— classify individual valuesvalidate— validate data quality against schemas- Getting Started — full walkthrough from install to first profile