Skip to content

Add Frigatebird#906

Open
alexey-milovidov wants to merge 2 commits into
mainfrom
add-frigatebird
Open

Add Frigatebird#906
alexey-milovidov wants to merge 2 commits into
mainfrom
add-frigatebird

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Summary

  • Adds a frigatebird/ ClickBench recipe for Frigatebird, an embedded columnar SQL database written in Rust (push-based Volcano execution, morsel parallelism, LZ4 + O_DIRECT storage).
  • ./load streams hits.parquet through a small pyarrow script (parquet_to_inserts.py) into the Frigatebird REPL as batched INSERT INTO hits VALUES (...) statements — Frigatebird has no COPY / Parquet / CSV ingest path.
  • create.sql collapses all integer widths to BIGINT and DATE to TIMESTAMP (Frigatebird's type system has no narrower forms), and uses the mandatory ORDER BY (CounterID, EventDate, UserID, EventTime, WatchID).
  • ./query measures runtime with bash built-in time since the CLI has no built-in timer.

Notes

  • parquet_to_inserts.py emits negative integers as quoted strings to work around Frigatebird's INSERT planner rejecting UnaryOp { Minus, Number } literals; the column-type coercion path parses them back to i64.
  • Frigatebird's SQL surface doesn't include EXTRACT, REGEXP_REPLACE, LENGTH/STRLEN, CASE, etc., so several queries will fail at parse/plan time and land as null in the results JSON.
  • In smoke testing, Frigatebird's TEXT decompressor panics with failed to decompress page payload: string is not valid utf8 on the non-UTF-8 bytes that the hits dataset's text columns contain. The recipe is wired up so the upstream behaviour on the full dataset is reproducible; expect many or all queries to be null until upstream stabilises ingest/scan for non-UTF-8 strings.

Resolves #809

Test plan

  • ./install && ./benchmark.sh on a fresh Ubuntu 24.04 VM
  • Confirm any queries that succeed have plausible timings; remaining queries surface as null

alexey-milovidov and others added 2 commits May 15, 2026 23:03
Frigatebird (https://github.com/Frigatebird-db/frigatebird) is an
embedded columnar SQL database in Rust. It ingests only via
INSERT ... VALUES, so ./load streams hits.parquet through
parquet_to_inserts.py (pyarrow) as batched INSERTs into the REPL.

Per the README, expect many queries to show up as null: Frigatebird's
SQL surface lacks EXTRACT/REGEXP_REPLACE/LENGTH/CASE, and its TEXT
decompressor panics on the non-UTF-8 bytes in the hits dataset.

Resolves #809
The frigatebird REPL reads one statement per line, so literal 0x0A /
0x0D bytes inside a quoted string value split the INSERT statement
across multiple lines. The parser then reports "Unterminated string
literal" on the first half and "Expected an SQL statement, found: <"
(or ПЕСНЮ, or ',0,119,28,...,') on the continuation lines, and
essentially no rows actually load. The hits dataset's string columns
(UserAgent, Referer, SearchPhrase, Title, ...) carry real newlines,
so replace them — and carriage returns — with spaces before quoting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add FrigateBird

1 participant