Getting Started

Get started

This tutorial walks you from pip install to a validated, workload-specific C++ analytical engine. You give SynnoDB your SQL queries and schema; its LLM agents design the storage layout, write the C++, compile it, and verify the results against DuckDB. By the end you will have synthesized an engine, benchmarked it, and generated an interactive Storage Explorer for the run.

Prerequisites: Python 3.10 or newer and an LLM API key (e.g. OPENAI_API_KEY). A working C/C++ toolchain (clang/LLVM) is used to compile the synthesized engine.

Installation

SynnoDB ships as a Python package. Install it from PyPI and export your model key. Everything else in this tutorial builds on these two lines.

bash · install

$ pip install synnodb
# clang/LLVM compiles the synthesized engine; bring your LLM key
$ export OPENAI_API_KEY="sk-..."

Confirm the install resolved and the CLI is on your path:

bash · verify

$ synnodb --version

Quickstart

The fastest path is a single command. Point SynnoDB at a file of SQL queries and a schema; it designs the storage, writes the C++, compiles it, and validates correctness against DuckDB before reporting the speedup.

synnodb · synthesize

$ synnodb synthesize --workload queries.txt --data data.parquet
  ✓ analyzed 22 queries · designed bespoke storage layout
  ✓ generated + compiled engine · validated against DuckDB (all correct)
  ✓ optimized hot paths over 3 iterations
→ 11.78× faster than DuckDB · engine written to ./engine/

Each step is verified before the next one runs, so a synthesized engine that does not match DuckDB row-for-row never ships. The compiled engine and its generated sources land in ./engine/.

Prefer to drive it from Python? The same end-to-end flow is one call. It returns a handle with the measured speedup and the path to the compiled engine.

python · synthesize

from synnodb import synthesize

engine = synthesize(workload="queries.txt", data="data.parquet")
print(engine.speedup)  # 11.78

Configure the model

SynnoDB uses an LLM agent to design and optimize the engine. The default is the small, inexpensive gpt-5-mini, read from OPENAI_API_KEY. To use any other provider, prefix the model name with litellm/; credentials are picked up from the environment for that provider.

synnodb · model

# default: OpenAI gpt-5-mini (uses OPENAI_API_KEY)
$ synnodb synthesize --workload queries.txt --data data.parquet

# route any provider through litellm (e.g. Anthropic via ANTHROPIC_API_KEY)
$ export ANTHROPIC_API_KEY="sk-ant-..."
$ synnodb synthesize --workload queries.txt --data data.parquet \
    --model litellm/claude-sonnet-4-6

Synthesize your workload

The quickstart used a single query file, but real workloads are sets of parameterized SQL templates over a known schema. Point SynnoDB at those templates and let it specialize the engine to them. The more representative your queries, the better the layout it can design.

Collect your SQL templates and schema

Gather the queries that matter for your workload into a .sql file and provide the table definitions in a schema file. Parameter placeholders are fine; SynnoDB treats them as the query shapes to optimize for.

Run the synthesis agent

The agent analyzes the workload, designs a bespoke storage layout, writes C++, compiles it, and revalidates against DuckDB after every optimization pass.

Inspect the output engine

The compiled engine and its generated C++ sources are written under ./engine/, ready to benchmark or ship.

synnodb · synthesize

$ synnodb synthesize \
    --workload workloads/analytics.txt \
    --data data/warehouse.parquet \
    --out ./engine

What the output engine looks like

The result is a self-contained, compiled C++ engine specialized to your queries, with a stable layout you can read, version, and run.

bash · ./engine

$ ls engine/
# engine        compiled binary (run your queries)
# src/          generated C++ (storage + per-query kernels)
# layout.json   the bespoke storage layout that was designed
# report.json   per-query speedups + DuckDB validation results

Benchmark

Once the engine is built, measure it against a baseline on your own data. The benchmark runs each query on both systems and reports per-query latency alongside the geomean speedup.

synnodb · benchmark

$ synnodb benchmark --engine ./engine --data ./data --baseline duckdb
# per-query latency vs DuckDB, with the geomean speedup

Swap --baseline to compare against other systems you run, and repeat the synthesis with a stronger model if you want to push the speedup further.

Generate a Storage Explorer

The per-query Storage Explorer is its own pip-installable module. Point it at a run and it produces an interactive page showing speedups, the generated code per optimization stage, the DuckDB query plan, and an LLM analysis of the code changes between stages.

bash · install

$ pip install bespoke-explorer

Everything is wired through the constructor: your Weights & Biases run, the git repo holding the generated-code snapshots, the analysis model, and an output directory. Credentials are taken from the environment (OPENAI_API_KEY, WANDB_API_KEY).

python · bespoke_explorer

from bespoke_explorer import ExplorerConfig, ExplorerBuilder

cfg = ExplorerConfig(
    entity="acme", project="engines",        # your wandb
    cache_repo="git://git.acme/engines.git",  # your generated code
    model="gpt-5-mini", out_dir="web/data",
)
ExplorerBuilder(cfg).scaffold("web")   # drop the viewer in
ExplorerBuilder(cfg).build("a2tlnfrk") # wandb id -> page

Open web/index.html to browse the generated explorer. The static viewer is framework-free, so it deploys anywhere static. Or run it from the CLI — python -m bespoke_explorer.cli --wandb-id <id> --model gpt-5-mini — which serves the page and opens it in your browser automatically, logging each LLM call as it analyzes the design. Omit --model (or model=) to skip the analysis: the page is still generated with a placeholder asking you to supply a model. If the W&B run can't be fetched, the build stops and opens a clear error page instead of silently inventing data.

Next steps

You have installed SynnoDB, synthesized and benchmarked an engine, and generated a Storage Explorer. From here:

Try it live

Run queries against a synthesized engine in your browser.

Explore a demo run

Walk a finished Storage Explorer end to end.

View on GitHub

Read the research and the source.

SynnoDB is launching as an early-access Python package. Request access to get an engine built for your workload.