Build a Typed DuckDB Pipeline
Go from raw CSV to a fully typed DuckDB database — profile, generate SQL, load, and query with proper types.
Goal: Turn a raw CSV into a fully typed DuckDB database in three commands — profile, generate the load SQL, and run it.
Prerequisites
Sample data
Save this as orders.csv:
order_id,customer_email,order_date,ship_date,amount,currency,country
ORD-001,[email protected],2024-01-15,2024-01-18,149.99,USD,United States
ORD-002,[email protected],2024-02-20,2024-02-22,2500.00,GBP,United Kingdom
ORD-003,[email protected],03/08/2024,03/10/2024,89.50,USD,United States
ORD-004,[email protected],2024-04-12,2024-04-15,1200.00,EUR,Germany
ORD-005,[email protected],2024-05-01,2024-05-03,340.75,AUD,AustraliaNotice the date formats are inconsistent — rows 1, 2, 4, and 5 use ISO format (2024-01-15) while row 3 uses US slash format (03/08/2024). This is exactly the kind of problem FineType catches.
Steps
1. Profile the CSV
Run profile to discover the semantic type of every column:
finetype profile -f orders.csvColumn Type Confidence
────────────── ──────────────────────────────── ──────────
order_id representation.identifier.custom 0.99
customer_email identity.contact.email 0.98
order_date datetime.date.iso_8601 0.87
ship_date datetime.date.iso_8601 0.87
amount representation.numeric.decimal 0.99
currency finance.reference.currency_code 0.98
country geography.location.country 0.96FineType has classified all seven columns. The confidence on order_date and ship_date is lower because of the mixed date formats — FineType picks the majority format but flags the inconsistency through the confidence score.
2. Generate the load SQL
Run load to generate a CREATE TABLE AS statement with proper casts:
finetype load -f orders.csvCREATE TABLE orders AS
SELECT
order_id::VARCHAR AS order_id,
customer_email::VARCHAR AS customer_email,
strptime(order_date, '%Y-%m-%d')::TIMESTAMP AS order_date,
strptime(ship_date, '%Y-%m-%d')::TIMESTAMP AS ship_date,
amount::DECIMAL(18,2) AS amount,
currency::VARCHAR AS currency,
country::VARCHAR AS country
FROM read_csv('orders.csv', header=true, all_varchar=true);Every cast is derived from FineType's type detection. The all_varchar=true flag tells DuckDB to read everything as text first, so FineType's casts control the conversion.
3. Load into DuckDB
Pipe the generated SQL directly into DuckDB:
finetype load -f orders.csv | duckdb orders.duckdbThe database file orders.duckdb now contains a fully typed orders table.
4. See the difference
Without FineType, DuckDB reads CSV columns as VARCHAR by default. Compare:
duckdb -c "SELECT column_name, column_type FROM (DESCRIBE SELECT * FROM read_csv('orders.csv'))"┌─────────────────┬──────────────┐
│ column_name │ column_type │
│ varchar │ varchar │
├─────────────────┼──────────────┤
│ order_id │ VARCHAR │
│ customer_email │ VARCHAR │
│ order_date │ VARCHAR │
│ ship_date │ VARCHAR │
│ amount │ VARCHAR │
│ currency │ VARCHAR │
│ country │ VARCHAR │
└─────────────────┴──────────────┘Now check the typed table:
duckdb orders.duckdb -c "SELECT column_name, column_type FROM (DESCRIBE orders)"┌─────────────────┬───────────────┐
│ column_name │ column_type │
│ varchar │ varchar │
├─────────────────┼───────────────┤
│ order_id │ VARCHAR │
│ customer_email │ VARCHAR │
│ order_date │ TIMESTAMP │
│ ship_date │ TIMESTAMP │
│ amount │ DECIMAL(18,2) │
│ currency │ VARCHAR │
│ country │ VARCHAR │
└─────────────────┴───────────────┘Dates are TIMESTAMP, amounts are DECIMAL. Queries that filter by date or sum amounts will now work correctly.
5. Query with proper types
With proper types in place, date arithmetic and numeric aggregation just work:
-- Orders shipped within 2 days
SELECT order_id, customer_email, order_date, ship_date,
ship_date - order_date AS shipping_time
FROM orders
WHERE ship_date - order_date <= INTERVAL 2 DAY;┌──────────┬────────────────────┬─────────────────────┬─────────────────────┬───────────────┐
│ order_id │ customer_email │ order_date │ ship_date │ shipping_time │
├──────────┼────────────────────┼─────────────────────┼─────────────────────┼───────────────┤
│ ORD-002 │ [email protected] │ 2024-02-20 00:00:00 │ 2024-02-22 00:00:00 │ 2 days │
│ ORD-003 │ [email protected] │ 2024-03-08 00:00:00 │ 2024-03-10 00:00:00 │ 2 days │
│ ORD-005 │ [email protected] │ 2024-05-01 00:00:00 │ 2024-05-03 00:00:00 │ 2 days │
└──────────┴────────────────────┴─────────────────────┴─────────────────────┴───────────────┘-- Total revenue by currency
SELECT currency, sum(amount) AS total, count(*) AS orders
FROM orders
GROUP BY currency
ORDER BY total DESC;┌──────────┬─────────┬────────┐
│ currency │ total │ orders │
├──────────┼─────────┼────────┤
│ GBP │ 2500.00 │ 1 │
│ EUR │ 1200.00 │ 1 │
│ AUD │ 340.75 │ 1 │
│ USD │ 239.49 │ 2 │
└──────────┴─────────┴────────┘These queries would fail or produce wrong results on VARCHAR columns. With typed data, sum() returns a decimal and date subtraction returns an interval.
What you learned
finetype profilediscovers semantic types in a CSVfinetype loadgenerates a DuckDBCREATE TABLE ASstatement with proper casts- Piping
loadoutput directly into DuckDB creates a typed database in one step - Typed columns enable date arithmetic, numeric aggregation, and correct sorting
See also
profilecommand reference — output formats and filtering optionsloadcommand reference — customising the generated SQL- Validate a Data Delivery — add quality gates before loading