Add GenDB by alexey-milovidov · Pull Request #932 · ClickHouse/ClickBench

alexey-milovidov · 2026-05-18T03:34:09Z

TLDR - it's not good, and kind of pointless. It generates low-quality code and sometimes produces incorrect answers (e.g., it misinterprets a SELECT * query). I hope students will defend their PhD.

Summary

Adds GenDB — an LLM-powered query engine that synthesizes a custom C++ binary per query — as a ClickBench system.
Per-query .cpp files were generated once, offline, by spawning Claude code-generator agents (mirroring GenDB's pipeline), then committed under gendb/generated/. Running the benchmark only compiles + executes them; no LLM access required at run time.
Storage layout is raw per-column binary files. gendb/ingest.py to decode hits.parquet into that layout (the upstream GenDB storage-designer normally writes its own ingest, but a battle-tested parquet decoder is more reliable than a generated one).
Result file: gendb/results/20260518/c8g.24xlarge.json (data 37 GB, QPS 0.642, 43 queries).
gendb/generation_times.json records the wall-clock cost of each .cpp synthesis; gendb/make_result.py adds those into the cold tries when assembling the result JSON (warm tries are unchanged — they reuse the binary on disk).

Test plan

All 43 .cpp files compile under g++ -std=c++17 -O3 -march=native -fopenmp.
./benchmark.sh runs end-to-end against a fresh checkout: install → bench_download → load → 43 × 3 query tries → concurrent QPS test → Data size: output.
generate-results.sh picks up gendb/results/20260518/c8g.24xlarge.json and produces a valid data.generated.js entry.
Re-run on a maintainer machine to confirm the timings reproduce.

GenDB (https://github.com/SolidLao/GenDB) is an LLM-powered query engine that synthesizes a custom C++ binary per SQL query. We ran its multi-agent code-generation pipeline once against the ClickBench schema + 43 queries and ship the resulting .cpp files in generated/ so the benchmark itself needs no LLM access — install just compiles them. Storage layout is a raw per-column binary store the agents picked; ingest.py converts hits.parquet into that layout (DuckDB does the parquet decode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- gendb/results/20260518/c8g.24xlarge.json — full 43-query cold/warm triple, load time, data size, concurrent QPS. - gendb/generation_times.json — wall-clock cost of synthesizing each per-query .cpp file (sum ≈ 41 min across 43 queries). make_result.py adds these to each cold try (warm tries unchanged — they reuse the binary on disk). - gendb/query: replace the awk needle-match with a bash string compare so Q29's regex backslashes don't get reinterpreted as awk escapes. Q29's [null,null,null] from the first benchmark run was the only query the awk version mishandled; results JSON has the post-fix [cold, warm, warm] for it. - gendb/utils/timing.h: print bare seconds so benchmark-common.sh's `^[0-9]+(\.[0-9]+)?$` last-line regex captures it. - gendb/storage_layout.json: trim to the 25 columns the 43 queries reference; HitColor is a 1-char string in hits.parquet that the int8 cast couldn't handle and no query needs it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexey-milovidov and others added 2 commits May 18, 2026 03:33

alexey-milovidov mentioned this pull request May 18, 2026

Add GenDB #808

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GenDB#932

Add GenDB#932
alexey-milovidov wants to merge 2 commits into
mainfrom
add-gendb

alexey-milovidov commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexey-milovidov commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alexey-milovidov commented May 18, 2026 •

edited

Loading