MeridianMERIDIAN

profile

Profile a CSV file — detect the semantic type of every column using column-mode inference.

Scan a CSV file and detect the semantic type of every column. profile is the fastest way to understand what your data contains — run it before writing any queries.

Usage

finetype profile [OPTIONS] --file <FILE>

Options

FlagTypeDefaultDescription
-f, --filepathInput CSV file (required)
-m, --modelpathmodels/defaultModel directory
-o, --outputstringplainOutput format: plain, json, csv, markdown, arrow
--sample-sizeinteger100Maximum values to sample per column
--delimitercharacterauto-detectCSV delimiter character
--no-header-hintflagDisable column name header hints
--model-typestringchar-cnnModel type: transformer, char-cnn, tiered
--sharp-onlyflagDisable Sense classifier (Sharpen-only pipeline)
--validateflagRun JSON Schema validation after classification
--enum-thresholdinteger50Cardinality threshold for ENUM columns
--verboseflagShow additional detail (JSON output)

Examples

Profile a CSV file

$ finetype profile -f contacts.csv
FineType Column Profile "contacts.csv" (12 rows, 6 columns)
════════════════════════════════════════════════════════════════════════════════

  COLUMN                    TYPE                                      BROAD   CONF
  ──────────────────────────────────────────────────────────────────────────────
  id                        representation.identifier.increment      BIGINT  50.0%
  name                      identity.person.full_name               VARCHAR  60.0%
  email                     identity.person.email                   VARCHAR  93.3%
  created_at                datetime.timestamp.iso_8601            TIMESTAMP  93.3%
  ip_address                technology.internet.ip_v4               VARCHAR 100.0%
  amount                    finance.currency.amount                 DECIMAL  60.0%

6/6 columns typed, 12 rows analyzed

Profile with JSON output

$ finetype profile -f contacts.csv -o json

JSON output includes the full type key, broad type, confidence, and sample values for each column. Pipe to jq for further processing:

$ finetype profile -f contacts.csv -o json | jq '.[].type_key'

Profile with validation

$ finetype profile -f contacts.csv --validate

Adding --validate runs each detected column's values against the corresponding JSON Schema and reports the pass rate. This is a quick way to check data quality without a separate validate step.

How it works

  1. Sample — reads up to --sample-size values from each column (default: 100).
  2. Classify — runs column-mode infer on each sample, using column names as header hints (unless --no-header-hint is set).
  3. Report — outputs the detected type, broad DuckDB type, and confidence for every column.

The broad type column (BIGINT, VARCHAR, TIMESTAMP, DECIMAL) tells you what DuckDB type each column can safely cast to. Use load to generate the full CREATE TABLE AS statement.

See also

  • load — generate a DuckDB CREATE TABLE AS from the profile results
  • infer — classify individual values
  • validate — validate data quality against schemas
  • Getting Started — full walkthrough from install to first profile

On this page