Getting Started

Get started

This tutorial walks you from pip install to a validated, workload-specific C++ analytical engine. You give SynnoDB your SQL queries and schema; its LLM agents design the storage layout, write the C++, compile it, and verify the results against DuckDB. By the end you will have synthesized an engine, benchmarked it, and generated an interactive Storage Explorer for the run.

Prerequisites: Python 3.10 or newer and an LLM API key (e.g. OPENAI_API_KEY). A working C/C++ toolchain (clang/LLVM) is used to compile the synthesized engine.

Installation

SynnoDB ships as a Python package. Install it from PyPI and export your model key. Everything else in this tutorial builds on these two lines.

bash · install
$ pip install synnodb
# clang/LLVM compiles the synthesized engine; bring your LLM key
$ export OPENAI_API_KEY="sk-..."

Confirm the install resolved and the CLI is on your path:

bash · verify
$ synnodb --version

Quickstart

The fastest path is a single command. Point SynnoDB at a file of SQL queries and a schema; it designs the storage, writes the C++, compiles it, and validates correctness against DuckDB before reporting the speedup.

synnodb · synthesize
$ synnodb synthesize --workload queries.txt --data data.parquet
  ✓ analyzed 22 queries · designed bespoke storage layout
  ✓ generated + compiled engine · validated against DuckDB (all correct)
  ✓ optimized hot paths over 3 iterations
 11.78× faster than DuckDB · engine written to ./engine/

Each step is verified before the next one runs, so a synthesized engine that does not match DuckDB row-for-row never ships. The compiled engine and its generated sources land in ./engine/.

Prefer to drive it from Python? The same end-to-end flow is one call. It returns a handle with the measured speedup and the path to the compiled engine.

python · synthesize
from synnodb import synthesize

engine = synthesize(workload="queries.txt", data="data.parquet")
print(engine.speedup)  # 11.78

Configure the model

SynnoDB uses an LLM agent to design and optimize the engine. The default is the small, inexpensive gpt-5-mini, read from OPENAI_API_KEY. To use any other provider, prefix the model name with litellm/; credentials are picked up from the environment for that provider.

synnodb · model
# default: OpenAI gpt-5-mini (uses OPENAI_API_KEY)
$ synnodb synthesize --workload queries.txt --data data.parquet

# route any provider through litellm (e.g. Anthropic via ANTHROPIC_API_KEY)
$ export ANTHROPIC_API_KEY="sk-ant-..."
$ synnodb synthesize --workload queries.txt --data data.parquet \
    --model litellm/claude-sonnet-4-6

Synthesize your workload

The quickstart used a single query file, but real workloads are sets of parameterized SQL templates over a known schema. Point SynnoDB at those templates and let it specialize the engine to them. The more representative your queries, the better the layout it can design.

1

Collect your SQL templates and schema

Gather the queries that matter for your workload into a .sql file and provide the table definitions in a schema file. Parameter placeholders are fine; SynnoDB treats them as the query shapes to optimize for.

2

Run the synthesis agent

The agent analyzes the workload, designs a bespoke storage layout, writes C++, compiles it, and revalidates against DuckDB after every optimization pass.

3

Inspect the output engine

The compiled engine and its generated C++ sources are written under ./engine/, ready to benchmark or ship.

synnodb · synthesize
$ synnodb synthesize \
    --workload workloads/analytics.txt \
    --data data/warehouse.parquet \
    --out ./engine

What the output engine looks like

The result is a self-contained, compiled C++ engine specialized to your queries, with a stable layout you can read, version, and run.

bash · ./engine
$ ls engine/
# engine        compiled binary (run your queries)
# src/          generated C++ (storage + per-query kernels)
# layout.json   the bespoke storage layout that was designed
# report.json   per-query speedups + DuckDB validation results

Benchmark

Once the engine is built, measure it against a baseline on your own data. The benchmark runs each query on both systems and reports per-query latency alongside the geomean speedup.

synnodb · benchmark
$ synnodb benchmark --engine ./engine --data ./data --baseline duckdb
# per-query latency vs DuckDB, with the geomean speedup

Swap --baseline to compare against other systems you run, and repeat the synthesis with a stronger model if you want to push the speedup further.

Generate a Storage Explorer

The per-query Storage Explorer is its own pip-installable module. Point it at a run and it produces an interactive page showing speedups, the generated code per optimization stage, the DuckDB query plan, and an LLM analysis of the code changes between stages.

bash · install
$ pip install bespoke-explorer

Everything is wired through the constructor: your Weights & Biases run, the git repo holding the generated-code snapshots, the analysis model, and an output directory. Credentials are taken from the environment (OPENAI_API_KEY, WANDB_API_KEY).

python · bespoke_explorer
from bespoke_explorer import ExplorerConfig, ExplorerBuilder

cfg = ExplorerConfig(
    entity="acme", project="engines",        # your wandb
    cache_repo="git://git.acme/engines.git",  # your generated code
    model="gpt-5-mini", out_dir="web/data",
)
ExplorerBuilder(cfg).scaffold("web")   # drop the viewer in
ExplorerBuilder(cfg).build("a2tlnfrk") # wandb id -> page

Open web/index.html to browse the generated explorer. The static viewer is framework-free, so it deploys anywhere static. Or run it from the CLI — python -m bespoke_explorer.cli --wandb-id <id> --model gpt-5-mini — which serves the page and opens it in your browser automatically, logging each LLM call as it analyzes the design. Omit --model (or model=) to skip the analysis: the page is still generated with a placeholder asking you to supply a model. If the W&B run can't be fetched, the build stops and opens a clear error page instead of silently inventing data.

Next steps

You have installed SynnoDB, synthesized and benchmarked an engine, and generated a Storage Explorer. From here:

SynnoDB is launching as an early-access Python package. Request access to get an engine built for your workload.