MeridianMERIDIAN

Use Anywhere

One ATTACH line exposes the whole catalog to DuckDB, Python, and R — the same file the browser reads.

The browser explorer is a convenience, not a walled garden. Every dataset is a plain Parquet file at a stable URL, and the whole catalog is a frozen DuckLake — so you can query it from a notebook, a script, or a scheduled job with one line.

One line to the whole catalog

ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY);

That attaches the open catalog read-only and exposes every dataset as a table:

TableWhat it is
meridian.gleifGLEIF legal entities (LEI register)
meridian.edgarSEC EDGAR company tickers
meridian.naicsNAICS industry classification

The catalog is a single ~5 MB file that points at the underlying Parquet by reference — attaching it copies no data. Your engine reads only the columns and row groups each query needs, straight from R2.

The snippets

These are the exact snippets the explorer's Use elsewhere drawer emits — the query shown is the GLEIF country breakdown from the quick start, but any query you build in the grid comes out the same shape.

-- One ATTACH exposes every Meridian open dataset as a table.
INSTALL ducklake;
ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY);
SELECT country, count(*) AS entities
FROM meridian.gleif
GROUP BY country
ORDER BY entities DESC
LIMIT 20;

-- Or read just this one Parquet directly:
INSTALL httpfs;
SELECT country, count(*) AS entities
FROM read_parquet('https://openlake.meridian.online/gleif.parquet')
GROUP BY country
ORDER BY entities DESC
LIMIT 20;
import duckdb

con = duckdb.connect()
con.execute("INSTALL ducklake; INSTALL httpfs;")
con.execute("ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY)")
df = con.sql("""
SELECT country, count(*) AS entities
FROM meridian.gleif
GROUP BY country
ORDER BY entities DESC
LIMIT 20
""").df()

# Or read just this one Parquet directly:
# duckdb.sql("""
# SELECT country, count(*) AS entities
# FROM read_parquet('https://openlake.meridian.online/gleif.parquet')
# GROUP BY country
# ORDER BY entities DESC
# LIMIT 20
# """).df()

Requires the duckdb package: pip install duckdb.

library(duckdb)

con <- dbConnect(duckdb())
dbExecute(con, "INSTALL ducklake; INSTALL httpfs;")
dbExecute(con, "ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY)")
dbGetQuery(con, "
SELECT country, count(*) AS entities
FROM meridian.gleif
GROUP BY country
ORDER BY entities DESC
LIMIT 20
")

# Or read just this one Parquet directly:
# dbGetQuery(con, "
# SELECT country, count(*) AS entities
# FROM read_parquet('https://openlake.meridian.online/gleif.parquet')
# GROUP BY country
# ORDER BY entities DESC
# LIMIT 20
# ")

Requires the duckdb package: install.packages("duckdb").

# DuckDB CLI — the whole commons via one ATTACH
duckdb -c "
  INSTALL ducklake;
  ATTACH 'ducklake:https://openlake.meridian.online/catalog/open.ducklake' AS meridian (READ_ONLY);
  SELECT country, count(*) AS entities FROM meridian.gleif GROUP BY country ORDER BY entities DESC LIMIT 20;"

# Or read just this one Parquet directly:
# duckdb -c "INSTALL httpfs; SELECT country, count(*) AS entities FROM read_parquet('https://openlake.meridian.online/gleif.parquet') GROUP BY country ORDER BY entities DESC LIMIT 20;"

Extension requirements

The engine needs two DuckDB extensions, both fetched on first use:

  • ducklake — resolves the ducklake: catalog. Install with INSTALL ducklake;. It pulls in HTTP support automatically, so the ATTACH line above works on its own.
  • httpfs — reads a Parquet file directly over HTTPS. Install with INSTALL httpfs;. You only need it for the read_parquet('https://…') fallback path.

Both live in the default extension repositories; a recent DuckDB (1.x) installs them without any extra configuration. INSTALL runs once and caches the extension locally.

ATTACH vs. read_parquet directly

Two ways in, and they answer different questions:

  • ATTACH the catalog when you want the whole commons behind one connection — stable table names (meridian.gleif, …), joins across datasets, and no per-file URLs to track.
  • read_parquet('https://openlake.meridian.online/<slug>.parquet') when you want exactly one dataset and nothing else. The <slug>.parquet URL is stable and always resolves to the current release.

Reproducible reads

The stable <slug>.parquet URL follows the latest release. If you need a fixed snapshot that never changes under you, use the content-addressed URL from the manifest instead — see Data & provenance.

Just download the file

No SQL engine handy? The Parquet is an ordinary object — curl it:

# Resolve the current immutable URL from the manifest, then download it:
curl -s https://openlake.meridian.online/manifest.json | jq -r '.datasets.gleif.url'

# Or grab the stable pointer directly (follows the latest release):
curl -O https://openlake.meridian.online/gleif.parquet

A note on the corpus column

Each table carries a hidden corpus column — a normalized blend of the text columns that powers the explorer's search box. SELECT * will include it (and it's large). When you want the clean, human columns, list them explicitly or exclude it:

SELECT * EXCLUDE (corpus) FROM meridian.gleif LIMIT 50;

On this page