Get started
This tutorial walks you from pip install to a validated, workload-specific C++ analytical engine. You give SynnoDB your SQL queries and schema; its LLM agents design the storage layout, write the C++, compile it, and verify the results against DuckDB. By the end you will have synthesized an engine, benchmarked it, and generated an interactive Storage Explorer for the run.
OPENAI_API_KEY). A working C/C++ toolchain (clang/LLVM) is used to compile the synthesized engine.Installation
SynnoDB ships as a Python package. Install it from PyPI and export your model key. Everything else in this tutorial builds on these two lines.
$ pip install synnodb # clang/LLVM compiles the synthesized engine; bring your LLM key $ export OPENAI_API_KEY="sk-..."
Confirm the install resolved and the CLI is on your path:
$ synnodb --version
Quickstart
The fastest path is a single command. Point SynnoDB at a file of SQL queries and a schema; it designs the storage, writes the C++, compiles it, and validates correctness against DuckDB before reporting the speedup.
$ synnodb synthesize --workload queries.txt --data data.parquet ✓ analyzed 22 queries · designed bespoke storage layout ✓ generated + compiled engine · validated against DuckDB (all correct) ✓ optimized hot paths over 3 iterations → 11.78× faster than DuckDB · engine written to ./engine/
Each step is verified before the next one runs, so a synthesized engine that does not match DuckDB row-for-row never ships. The compiled engine and its generated sources land in ./engine/.
Prefer to drive it from Python? The same end-to-end flow is one call. It returns a handle with the measured speedup and the path to the compiled engine.
from synnodb import synthesize engine = synthesize(workload="queries.txt", data="data.parquet") print(engine.speedup) # 11.78
Configure the model
SynnoDB uses an LLM agent to design and optimize the engine. The default is the small, inexpensive gpt-5-mini, read from OPENAI_API_KEY. To use any other provider, prefix the model name with litellm/; credentials are picked up from the environment for that provider.
# default: OpenAI gpt-5-mini (uses OPENAI_API_KEY) $ synnodb synthesize --workload queries.txt --data data.parquet # route any provider through litellm (e.g. Anthropic via ANTHROPIC_API_KEY) $ export ANTHROPIC_API_KEY="sk-ant-..." $ synnodb synthesize --workload queries.txt --data data.parquet \ --model litellm/claude-sonnet-4-6
Synthesize your workload
The quickstart used a single query file, but real workloads are sets of parameterized SQL templates over a known schema. Point SynnoDB at those templates and let it specialize the engine to them. The more representative your queries, the better the layout it can design.
Collect your SQL templates and schema
Gather the queries that matter for your workload into a .sql file and provide the table definitions in a schema file. Parameter placeholders are fine; SynnoDB treats them as the query shapes to optimize for.
Run the synthesis agent
The agent analyzes the workload, designs a bespoke storage layout, writes C++, compiles it, and revalidates against DuckDB after every optimization pass.
Inspect the output engine
The compiled engine and its generated C++ sources are written under ./engine/, ready to benchmark or ship.
$ synnodb synthesize \ --workload workloads/analytics.txt \ --data data/warehouse.parquet \ --out ./engine
What the output engine looks like
The result is a self-contained, compiled C++ engine specialized to your queries, with a stable layout you can read, version, and run.
$ ls engine/ # engine compiled binary (run your queries) # src/ generated C++ (storage + per-query kernels) # layout.json the bespoke storage layout that was designed # report.json per-query speedups + DuckDB validation results
Benchmark
Once the engine is built, measure it against a baseline on your own data. The benchmark runs each query on both systems and reports per-query latency alongside the geomean speedup.
$ synnodb benchmark --engine ./engine --data ./data --baseline duckdb # per-query latency vs DuckDB, with the geomean speedup
Swap --baseline to compare against other systems you run, and repeat the synthesis with a stronger model if you want to push the speedup further.
Generate a Storage Explorer
The per-query Storage Explorer is its own pip-installable module. Point it at a run and it produces an interactive page showing speedups, the generated code per optimization stage, the DuckDB query plan, and an LLM analysis of the code changes between stages.
$ pip install bespoke-explorer
Everything is wired through the constructor: your Weights & Biases run, the git repo holding the generated-code snapshots, the analysis model, and an output directory. Credentials are taken from the environment (OPENAI_API_KEY, WANDB_API_KEY).
from bespoke_explorer import ExplorerConfig, ExplorerBuilder cfg = ExplorerConfig( entity="acme", project="engines", # your wandb cache_repo="git://git.acme/engines.git", # your generated code model="gpt-5-mini", out_dir="web/data", ) ExplorerBuilder(cfg).scaffold("web") # drop the viewer in ExplorerBuilder(cfg).build("a2tlnfrk") # wandb id -> page
Open web/index.html to browse the generated explorer. The static viewer is framework-free, so it deploys anywhere static. Or run it from the CLI — python -m bespoke_explorer.cli --wandb-id <id> --model gpt-5-mini — which serves the page and opens it in your browser automatically, logging each LLM call as it analyzes the design. Omit --model (or model=) to skip the analysis: the page is still generated with a placeholder asking you to supply a model. If the W&B run can't be fetched, the build stops and opens a clear error page instead of silently inventing data.
Next steps
You have installed SynnoDB, synthesized and benchmarked an engine, and generated a Storage Explorer. From here: