Skip to content

feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769

Draft
iav wants to merge 2 commits into
mainfrom
feat/native-armhf-on-arm64
Draft

feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769
iav wants to merge 2 commits into
mainfrom
feat/native-armhf-on-arm64

Conversation

@iav
Copy link
Copy Markdown
Contributor

@iav iav commented May 5, 2026

Summary

Continues #9284 (arm64-compat-vdso extension + custom_kernel_make_params hook).
That PR enabled COMPAT_VDSO in arm64 kernels; this PR uses the resulting
32-bit-capable kernels to run 32-bit ARM ELF directly instead of through
qemu-arm emulation in chroot work — typically ~12× faster than qemu-arm-static
on Cortex-A53/A55/A72/A73/A76.

How

Disable qemu-arm in /proc/sys/fs/binfmt_misc/ for the duration of the
build; kernel binfmt_elf then handles 32-bit ARM ELFs natively via
CONFIG_COMPAT. Activation is delayed until AFTER mmdebstrap, whose
cross-arch path needs a working qemu-arm registration to populate the
chroot before libc/ld-linux-armhf.so.3 exist.

Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms never, disabled).

Concurrency — kernel flock on the binfmt entry

Concurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD flock(2) on /proc/sys/fs/binfmt_misc/qemu-arm itself.
No userspace state, no per-builder lockfiles, no /run/lock directory.

  • Each builder holds LOCK_SH on the binfmt entry via a long-lived fd.
    Kernel BSD-flock counter is the refcount; kernel releases the fd on
    process exit (crash-safe).
  • First-arrival idempotently echo 0 to disable; subsequent arrivals
    observe 0 and proceed without writing.
  • On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on
    a fresh fd, succeeds iff zero other LOCK_SH holders. Last-out
    re-enables qemu-arm.

Trade-offs

  1. Prior qemu-arm state is not recorded across builds. Last-out
    unconditionally re-enables. An admin's pre-existing disabled
    policy is not preserved across the build window.
  2. No defense against an external agent toggling qemu-arm mid-build.
    Don't, or use NATIVE_ARMHF_ON_ARM64=no on builds that should not
    interfere.
  3. Cleanup ordering invariant — BSD flock is per-OFD; a forked subshell
    inheriting our SH-fd shares the same lock entry. We rely on
    add_cleanup_handler running registrations in order and the umount
    teardown handlers (registered first) killing the docker container
    before our restore handler runs. Documented in the trap handler's
    docstring; if a future refactor breaks ordering, POSIX F_SETLK or
    explicit descendant-kill are the documented escape hatches.

The earlier revision of this work coordinated via a userspace
owner-flock directory (~250 lines, depended on /run/lock being
host-bind-mounted into build containers — which Armbian docker does
not do). Replaced with the kernel-flock approach above; available in
PR history if needed.

Empirical

./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes RELEASE=noble
on droid (RK3328) and m1 (RK3568, Cortex-A55):

Scenario Real time Speedup
baseline, rootfs cache-miss 60:35
this PR, rootfs cache-miss 19:27 3.12×
this PR, rootfs cache-hit 6:38 ~9× over expected baseline cache-hit

mmdebstrap itself (~10 min of every cache-miss build) cannot be
accelerated — speedup applies to subsequent chroot operations.

Test plan

Phase 1 — feature on iron (single builder):

  • helios4 cache-miss + cache-hit, full image
  • cleanup re-enables qemu-arm on exit
  • DOCKER_PRIVILEGED=no (CAP_SYS_ADMIN unconditional)
  • killswitch (NATIVE_ARMHF_ON_ARM64=no/never/disabled) — silent return
  • x86_64 host regression — silent return
  • aarch64 native (no-container) PREFER_DOCKER=no smoke

Phase 2 — concurrency, all on Hetzner CAX21 (kernel 6.8.0-90, binfmt_misc fs):

  • flock primitives on /proc/sys/fs/binfmt_misc/qemu-arm work as expected
  • single-tree build: setup + 5 phase markers + last-out restore
  • two independent worktrees concurrent: both enter native path, kernel
    SH-counter holds qemu-arm disabled, first-finished suppresses restore,
    last-finished restores
  • NATIVE_ARMHF_ON_ARM64=no opt-out with concurrent native builder —
    refused-fast guard rejects (avoids clobbering)
  • SIGINT mid-chroot — cleanup ordering invariant holds; last-out fires,
    qemu-arm restored, no stale state

Summary by CodeRabbit

  • New Features

    • Allow native execution of 32-bit ARM binaries on compatible 64-bit hosts to speed and simplify ARM image builds.
    • Skip copying emulation binaries into chroots when native native-ARM handling is active.
  • Bug Fixes

    • Improve coordination and recovery around host binary-format handling to avoid interference between concurrent builds and ignore non-fatal setup failures.

Review Change Stack

@iav iav requested a review from a team as a code owner May 5, 2026 18:15
@iav iav requested review from PanderMusubi and rpardini and removed request for a team May 5, 2026 18:15
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

Adds coordinated host binfmt_misc handling to prefer kernel-native 32-bit ARM ELF on aarch64 hosts, conditional qemu deploy/undeploy behavior, and invokes the setup during rootfs creation and image build; setup failures are ignored.

Changes

Native armhf via binfmt_elf Mode

Layer / File(s) Summary
Core Observation
lib/functions/rootfs/qemu-static.sh
Adds _native_armhf_observe_qemu_arm_state() to report qemu-arm binfmt_misc presence and enabled/disabled state.
Core Setup & Coordination
lib/functions/rootfs/qemu-static.sh
Adds _native_armhf_setup_binfmt_elf() that acquires a control flock, may disable qemu-arm, runs arch-test armhf, and exports ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes.
Cleanup & Owner Restore
lib/functions/rootfs/qemu-static.sh
Adds trap_handler_native_armhf_restore_qemu_arm() that restores qemu-arm only when the last owner exits; otherwise it releases ownership.
Killswitch Release Handler
lib/functions/rootfs/qemu-static.sh
Adds trap_handler_native_armhf_release_emul_lock() that releases the shared emulation lock fd without restoring binfmt state.
qemu Deploy/Undeploy Wiring
lib/functions/rootfs/qemu-static.sh
deploy_qemu_binary_to_chroot() skips copying qemu when ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes; undeploy_qemu_binary_from_chroot() treats missing chroot qemu as a no-op only in native mode.
Prepare Guards & Verification
lib/functions/rootfs/qemu-static.sh
Skip enabling qemu-arm for non-armhf targets on aarch64 hosts; attempt native-armhf setup early for aarch64→armhf; add preflight guard refusing to re-enable an observably-disabled registration; change verification to arch-test armhf and tweak heredoc formatting.
Rootfs Creation Integration
lib/functions/rootfs/rootfs-create.sh
Calls `_native_armhf_setup_binfmt_elf
Image Build Integration
lib/functions/main/rootfs-image.sh
Calls `_native_armhf_setup_binfmt_elf

Sequence Diagram(s)

sequenceDiagram
    participant Build as Build Flow
    participant Setup as _native_armhf_setup_binfmt_elf()
    participant Binfmt as /proc/sys/fs/binfmt_misc
    participant Trap as Cleanup Handler
    participant Deploy as deploy_qemu_binary_to_chroot()

    Build->>Setup: invoke (post-rootfs extraction)
    Setup->>Binfmt: read qemu-arm entry/state
    alt qemu-arm enabled
        Setup->>Binfmt: write disable qemu-arm
        Setup->>Trap: register restore handler
        Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
    else qemu-arm absent_or_disabled
        Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
    end
    Build->>Deploy: proceed to deploy step
    Deploy->>Deploy: check ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF
    alt native mode set
        Deploy-->>Build: early return (skip qemu copy)
    else
        Deploy->>Build: copy qemu-arm-static into chroot
    end
    Build-->>Trap: on exit/signal
    Trap->>Binfmt: write restore qemu-arm state
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled at binfmt's quiet seam,
Turned qemu off so armhf could dream,
Flag raised, trap set neat,
Chroots wake on native feet,
Builds hop forward, light and trim.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: enabling native armhf execution on aarch64 hosts by disabling qemu-arm binfmt_misc, which is the primary feature introduced across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/native-armhf-on-arm64

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added size/medium PR with more then 50 and less then 250 lines 05 Milestone: Second quarter release Needs review Seeking for review Framework Framework components labels May 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early return in the qemu-static check (when
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF == "yes") prevents the matching undeploy
from running later, leaking host qemu into cached rootfs; instead of returning
from the function, set a clear marker variable (e.g.,
ARMBIAN_QEMU_DEPLOY_SKIPPED=yes or ARMBIAN_QEMU_DEPLOYED=no) after calling
display_alert and allow the function to exit normally so the caller
(rootfs-create.sh) can always run the undeploy path and check that marker;
update the undeploy logic to look for that same marker
(ARMBIAN_QEMU_DEPLOY_SKIPPED or ARMBIAN_QEMU_DEPLOYED) so it only skips removal
when deploy truly never ran.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eba7fdae-710a-42e6-bcb9-0d41993767df

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce7dec and 502287f.

📒 Files selected for processing (4)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
  • lib/library-functions.sh

Comment thread lib/functions/rootfs/qemu-static.sh
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 502287f to 6f7903b Compare May 5, 2026 18:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
lib/functions/rootfs/qemu-static.sh (1)

20-25: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Deploy/undeploy asymmetry can leak host qemu-arm-static into the rootfs cache.

The flag check is symmetric inside this file, but at the call sites (rootfs-create.sh, rootfs-image.sh per the summary), deploy_qemu_binary_to_chroot runs before _native_armhf_setup_binfmt_elf flips ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes, while undeploy_qemu_binary_from_chroot runs after. In that ordering the binary gets copied into the chroot, but the matching removal is suppressed by the new early-return at Line 56. The host qemu-arm-static then persists into the rootfs cache and into final armhf images on subsequent cache-hit builds.

Track per-(caller, chroot_target) deployment state and only skip undeploy when deploy actually skipped, rather than gating both on the flag alone.

🛠️ Suggested fix
+declare -gA armbian_qemu_deployed_to_chroot=()
+
 function deploy_qemu_binary_to_chroot() {
 	declare chroot_target="${1}" caller="${2}"
@@
 	if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
 		display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info"
 		return 0
 	fi
@@
 	display_alert "Deploying qemu-user-static binary to chroot" "${QEMU_BINARY} during ${caller}" "info"
 	run_host_command_logged cp -pv "${src_host}" "${dst_target}"
+	armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]=yes
 
 	return 0
 }
@@
 function undeploy_qemu_binary_from_chroot() {
@@
-	if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then
+	if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" \
+		&& "${armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]:-no}" != "yes" ]]; then
 		display_alert "Native armhf via binfmt_elf" "no qemu binary to remove during ${caller}" "debug"
 		return 0
 	fi

Also applies to: 55-59

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/rootfs/qemu-static.sh` around lines 20 - 25, The early-return
based solely on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in
deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry
across separate invocations; change the logic so that when
deploy_qemu_binary_to_chroot skips copying due to the flag it records that
decision with a per-(caller,chroot_target) marker (e.g., a temp file named with
${caller} and the chroot target inside the rootfs cache or a shared temp dir),
and then have undeploy_qemu_binary_from_chroot check that marker and only skip
removal when the deploy previously skipped for that same (caller,chroot_target);
ensure you still clean up the marker when an actual copy was performed and when
removal runs so state cannot leak into later builds (references: functions
deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 169-188: Move the add_cleanup_handler call so it runs before
writing "0" to /proc/sys/fs/binfmt_misc/qemu-arm (register the trap before the
destructive action) and update trap_handler_native_armhf_restore_qemu_arm to not
early-return based on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF; instead key the
cleanup on the actual binfmt_misc state (check [[ -e
/proc/sys/fs/binfmt_misc/qemu-arm ]] and attempt echo 1 >
/proc/sys/fs/binfmt_misc/qemu-arm || true), so a partially completed setup or an
interrupt between the disable and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF assignment
will still re-enable qemu-arm.

---

Duplicate comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early-return based solely on
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in
deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry
across separate invocations; change the logic so that when
deploy_qemu_binary_to_chroot skips copying due to the flag it records that
decision with a per-(caller,chroot_target) marker (e.g., a temp file named with
${caller} and the chroot target inside the rootfs cache or a shared temp dir),
and then have undeploy_qemu_binary_from_chroot check that marker and only skip
removal when the deploy previously skipped for that same (caller,chroot_target);
ensure you still clean up the marker when an actual copy was performed and when
removal runs so state cannot leak into later builds (references: functions
deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e697b537-3deb-4c61-94a1-02ef9c76d299

📥 Commits

Reviewing files that changed from the base of the PR and between 502287f and 6f7903b.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/rootfs-create.sh

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
@iav iav force-pushed the feat/native-armhf-on-arm64 branch 2 times, most recently from eab2ab1 to d990278 Compare May 5, 2026 19:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/functions/rootfs/qemu-static.sh (1)

131-134: 💤 Low value

Optional: case-insensitive killswitch matching.

NATIVE_ARMHF_ON_ARM64=NO (or Disabled, False, etc.) would slip past the killswitch and proceed to detection. This is a common ergonomic gotcha when users export env vars from CI configs. Low priority since the function self-aborts on the subsequent guards anyway, but a one-liner makes the contract explicit.

♻️ Optional refactor
-	# Killswitch — accept no/never/disabled as synonyms; bail before any detection runs.
-	case "${NATIVE_ARMHF_ON_ARM64:-auto}" in
-		no | never | disabled) return 1 ;;
-	esac
+	# Killswitch — accept no/never/disabled (any case) as synonyms; bail before any detection runs.
+	case "${NATIVE_ARMHF_ON_ARM64:-auto,,}" in
+		no | never | disabled) return 1 ;;
+	esac
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/rootfs/qemu-static.sh` around lines 131 - 134, The killswitch
case for NATIVE_ARMHF_ON_ARM64 is currently case-sensitive and will miss values
like "NO" or "Disabled"; update the case statement around the
NATIVE_ARMHF_ON_ARM64 check to perform case-insensitive matching (e.g., convert
the value to lowercase with parameter expansion like
"${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the patterns to bracketed
forms such as
[Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee]
to accept common synonyms) so that the return 1 kill path in that branch
reliably triggers for uppercase/mixed-case inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 131-134: The killswitch case for NATIVE_ARMHF_ON_ARM64 is
currently case-sensitive and will miss values like "NO" or "Disabled"; update
the case statement around the NATIVE_ARMHF_ON_ARM64 check to perform
case-insensitive matching (e.g., convert the value to lowercase with parameter
expansion like "${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the
patterns to bracketed forms such as
[Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee]
to accept common synonyms) so that the return 1 kill path in that branch
reliably triggers for uppercase/mixed-case inputs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff5855c0-1c00-4b4b-8c4e-dc619c12d9a0

📥 Commits

Reviewing files that changed from the base of the PR and between 6f7903b and d990278.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/rootfs-create.sh

@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 5, 2026

Re: nitpick on qemu-static.sh:131-134 (case-insensitive killswitch matching) — declined. Project convention for boolean toggles is literal lowercase comparison ([[ "${VAR}" == "yes" ]], [[ "${VAR:-no}" == "yes" ]]). Examples: cli-artifact.sh:56 (ARTIFACT_USE_CACHE), utils-cli.sh:247 (PREFER_DOCKER:-yes), bsp/armbian-bsp-cli-deb.sh:239 (SHOW_DEBUG). No ${var,,} normalization on boolean flags anywhere in lib/functions/. Keeping the killswitch in line with that convention.

@iav iav force-pushed the feat/native-armhf-on-arm64 branch from d990278 to 15789ee Compare May 5, 2026 19:42
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 5, 2026

Who use arm64 builders — try it!

@igorpecovnik
Copy link
Copy Markdown
Member

aarch64 host (CIX). Always fails:

./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes KERNEL_CONFIGURE=no RELEASE=noble SHARE=yes

[🐳|🌱] git: Fetch from remote completed, rev-parsing... [ 'mmdebstrap-debian-devel' 'main' 'FETCH_HEAD' ]
[🐳|🌱] mmdebstrap version [ '1.5.7 (2025-04-08)' for /armbian/cache/sources/mmdebstrap-debian-devel/mmdebstrap ]
[🐳|🌱] Installing base system with 5 packages [ Stage 1/1 ]
[🐳|🌱] fetch_distro_keyring(noble) [ cache found, skipping ]
[🐳|🌱] Deploying qemu-user-static binary to chroot [ qemu-arm-static during rootfs ]
[🐳|🔨]   '/usr/bin/qemu-arm-static' -> '/armbian/.tmp/rootfs-09e5dde9-e782-4c83-a704-7d2314fd45c0/usr/bin/qemu-arm-static'
[🐳|🔨]   I: automatically chosen mode: root
[🐳|🔨]   I: armhf cannot be executed natively, but transparently using qemu-user binfmt emulation
[🐳|🔨]   I: automatically chosen format: directory
[🐳|🔨]   I: skipping check/empty as requested
[🐳|🔨]   I: running special hook: copy-in /armbian/cache/keyrings/ubuntu/usr /armbian/cache/keyrings/ubuntu/etc /
[🐳|🔨]   I: running --setup-hook in shell: sh -c 'mkdir -p /armbian/cache/aptcache/noble-armhf/archives "$1"/var/cache/apt/archives/' exec /armbian/.tmp/rootfs-09e5dde9-e782-4c83-a704-7d2314fd45c0
[🐳|🔨]   I: running special hook: sync-in /armbian/cache/aptcache/noble-armhf/archives /var/cache/apt/archives/
[🐳|🔨]   I: running apt-get update...
[🐳|🔨]   I: downloading packages with apt...
[🐳|🔨]   I: extracting archives...
[🐳|🔨]   I: installing essential packages...
[🐳|🔨]   I: installing remaining packages inside the chroot...
[🐳|🔨]   I: running special hook: sync-out /var/cache/apt/archives/ /armbian/cache/aptcache/noble-armhf/archives
[🐳|🔨]   I: cleaning package lists and apt cache...
[🐳|🔨]   I: success in 338.3367 seconds
[🐳|🌱] Native armhf via binfmt_elf [ kernel 6.18.8-current-arm64, aarch64 host with COMPAT_VDSO; qemu-arm disabled, kernel binfmt_elf takes over ]
[🐳|🌱] Cleaning up after mmdebstrap [ mmdebstrap cleanup ]
[🐳|🌱] Diverting [ initctl/start-stop-daemon ]
[🐳|🔨]   chroot: failed to run command '/usr/bin/env': Exec format error

@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 15789ee to df7aaf9 Compare May 5, 2026 22:07
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 5, 2026

@igorpecovnik thanks for the report.

Root cause: arch-test armhf returns 0 on your CIX host because qemu-arm in binfmt_misc interprets its tiny test stub — false positive on hosts whose kernel was built without COMPAT_VDSO. After my code disabled qemu-arm, the kernel tried to execute armhf ELF natively and failed with Exec format error because the host kernel doesn't have the compat-vdso layer.

Fix: arch-test moved to AFTER the binfmt_misc disable, so it reflects what the chroot exec will actually face. On detection failure we restore qemu-arm immediately and fall back to qemu-static emulation transparently.

After this fix lands on your builder, your build will work as before — slowly, via qemu-arm-static, no regression. To get the speedup, build your builder host's kernel with the arm64-compat-vdso extension enabled (extensions/arm64-compat-vdso, PR #9284); then native armhf via binfmt_elf will activate automatically.

iav added a commit that referenced this pull request May 7, 2026
…lel builds

PR review by codex (gpt-5.5) flagged the missing synchronisation around the
shared /proc/sys/fs/binfmt_misc/qemu-arm flag as the main blocker for shipping
NATIVE_ARMHF_ON_ARM64 default-on. The previous refcount-less code disabled
qemu-arm globally in setup and unconditionally re-enabled it in cleanup —
two parallel builds on the same host could:

  * have build A disable qemu-arm and register restore;
  * have build B observe 'already disabled', set
    ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes and skip copying
    qemu-arm-static into its own chroot;
  * have build A finish, cleanup blindly write 1 → qemu-arm enabled;
  * have build B continue with no qemu binary in chroot but
    qemu-arm globally registered — runtime failure.

Replace with a per-host owner-flock pattern (codex-recommended; refcount/
pidlist breaks under PID namespaces and PID reuse, kernel fd-release on
process exit gives crash safety for free):

  * /run/lock/armbian-native-armhf/owners/<ARMBIAN_BUILD_UUID>.lock —
    one file per builder, long-lived flock held for the whole native-mode
    window. Closing the fd (process exit, normal or crash) releases the
    lock; another builder probing it with non-blocking flock detects the
    death and prunes the file.
  * control.lock — short-lived flock around prune+count+state-write so
    first/last-owner decisions stay consistent under concurrency.
  * Live owner count = number of owner files we can NOT acquire
    non-blocking. Refcount is not used as source of truth.
  * First owner records prior qemu-arm state (1 / 0 / missing) into
    prior-qemu-arm-state and only writes 0 if prior was 1. Subsequent
    owners join without state mutation.
  * Last owner restores from the saved prior — never blindly writes 1.
    If prior was already disabled or the handler was missing, leave it
    alone. State divergence (admin or another tool changed qemu-arm
    during the active window) emits a warning, not a fight.

Per-host scope by design — binfmt_misc is per-kernel state, so cross-host
NFS coordination would only serialise independent kernels for no benefit.
The lockdir lives under /run/lock so it is local-to-host and host-shared
(the Armbian build container bind-mounts /run from the host, satisfying
the requirement). Documented in the function header comment.

Other review items addressed:

  * The 'caller sets ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes' comment was
    inaccurate (the function sets it itself) — fixed.
  * Function header rewritten to spell out the concurrency model and
    Docker requirement explicitly.

Reported-by: codex (PR #9769 review-helper)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
iav added a commit that referenced this pull request May 7, 2026
…9f27

Codex review of the previous lock-pattern commit (5359f27) found three
real holes:

1. trap_handler released the owner lock BEFORE taking the control lock.
   Between fd close and flock(control) a fresh builder could enter setup,
   prune+count and see live_count == 1 (only itself), then observe the
   now-disabled qemu-arm and write '0' as its prior_state. Our subsequent
   restore would read that overwritten state and never re-enable
   qemu-arm — turning a coordinated handoff into permanent disablement.
   Fix: take control.lock first, release owner UNDER control, then count
   and restore. The whole transition is now atomic.

2. The post-disable arch-test failure path released the owner lock
   manually and then called the trap handler. The handler's first
   guard ([[ -n ${_native_armhf_owner_fd:-} ]] || return 0) saw the
   already-cleared global and exited without restore — leaving qemu-arm
   disabled and the state file leaking. Fix: drop control.lock and let
   the handler do the release+count+restore under its own control lock,
   matching the normal cleanup path.

3. Stale active window after process kill (SIGKILL / OOM / kill -9
   between echo 0 and trap firing): kernel releases the fd so the
   owner lock is gone, but the state file remains. The next first-owner
   would observe the leftover-disabled qemu-arm and record disabled as
   its new prior_state — clean shutdown then never re-enables. Fix: in
   the first-owner branch, before saving prior_state, check for an
   existing state file. If found and prior was '1', reconcile qemu-arm
   back to enabled and remove the stale file before observing our own
   prior_state. Crash recovery is now self-healing on the next build.

Same Edit additionally cleans up the disable-failure case: when
'echo 0 > qemu-arm' itself fails (no CAP_SYS_ADMIN), no state actually
changed, so leave state file and owner lock alone — the cleanup handler
will observe current==prior==1 and no-op the restore. The previous code
released owner outside control and rm-ed the state file, both of which
introduced the same race the trap_handler change closes elsewhere.

Reported-by: codex (PR #9769 lock-impl review-helper)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
@github-actions github-actions Bot added size/large PR with 250 lines or more and removed size/medium PR with more then 50 and less then 250 lines labels May 7, 2026
iav added a commit that referenced this pull request May 7, 2026
…3d4c8cd

Codex re-review of fix2 (commit 3d4c8cd) confirmed the original three
races are properly closed but identified two new ones, both producing
the same class of runtime failure: a parallel builder ends up running
chroot exec against globally-enabled qemu-arm with no qemu binary
inside its chroot.

1. Disable-failure path left a live owner. When 'echo 0 > qemu-arm'
   itself fails (no CAP_SYS_ADMIN), fix2 chose to leave the owner lock
   and the prior-state file in place, on the theory that the cleanup
   handler would no-op the restore at exit. But the owner stays
   counted as live for the entire build duration, so a concurrent
   builder enters its own setup, sees live_count > 1, takes the join
   branch, skips qemu binary deployment, and fails at chroot exec
   time. Fix: a failed disable now performs a full abort under the
   held control lock — rm state file, release owner, drop control,
   return 1. The aborted setup leaves no observable native window for
   anyone to join.

2. Join path did not verify the actual kernel state. fix2 trusted that
   live_count > 1 implies qemu-arm is disabled. But an external agent
   (admin running 'update-binfmts --enable qemu-arm', a stray service)
   can flip it back on during the active window. Our pre-flight
   arch-test then passes as a false positive — qemu interprets the
   ARM stub — and we skip qemu binary deployment, hitting the same
   runtime failure as case 1. Fix: in the join branch, observe the
   current qemu-arm state explicitly. If it is enabled, this is not
   a normal join — abort to qemu fallback with a warning naming the
   external mutator. This is a one-shot invariant check at join time,
   not a periodic re-verify (which codex previously argued against).

Reported-by: codex (PR #9769 lock-impl re-review)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 3d4c8cd to 4998f52 Compare May 7, 2026 17:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 324-331: The else branch that runs when live_count > 1 must not
assume qemu-arm is already disabled; before registering the join behavior
(trap_handler_native_armhf_restore_qemu_arm, display_alert and setting
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF), read the actual qemu-arm state (the same
state/sysfs/binfmt path used when writing "0"/"1") and verify it is "0"
(disabled); if it is not "0", fail fast (error out) rather than joining, so the
subsequent arch-test armhf and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF logic do not
incorrectly skip deploying qemu-arm-static. Ensure you reference the same state
file/variable used by the disable code path and keep the cleanup/restore handler
only for true joiners.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b9577ae-7b84-40f0-972d-8e8b8d70ef1c

📥 Commits

Reviewing files that changed from the base of the PR and between df7aaf9 and 3d4c8cd.

📒 Files selected for processing (1)
  • lib/functions/rootfs/qemu-static.sh

Comment thread lib/functions/rootfs/qemu-static.sh Outdated
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 4998f52 to 152d4f6 Compare May 7, 2026 17:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/functions/main/rootfs-image.sh (1)

19-19: ⚡ Quick win

Make the non-fatal fallback explicit instead of silently swallowing setup failure.

At Line 19, || true preserves behavior, but it hides when native armhf activation fails. Please wrap this in an explicit if ! ...; then display_alert ...; fi so operators can see when the build falls back to qemu-static.

Suggested change
-	_native_armhf_setup_binfmt_elf || true
+	if ! _native_armhf_setup_binfmt_elf; then
+		display_alert "Native armhf via binfmt_elf unavailable" "Falling back to qemu-static path" "warn"
+	fi

Based on learnings: In the Armbian build framework, scripts run with set -e, so masked failures should be explicitly handled and not silently ignored.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/main/rootfs-image.sh` at line 19, The call to
_native_armhf_setup_binfmt_elf currently uses "|| true" which silently swallows
failures; replace that pattern with an explicit conditional: invoke
_native_armhf_setup_binfmt_elf and if it returns non-zero run a visible fallback
notification (e.g. if ! _native_armhf_setup_binfmt_elf; then display_alert
"native armhf binfmt activation failed; falling back to qemu-static"; fi) so
operators see when the script falls back to qemu-static; keep the existing
non-fatal behavior but surface the failure via display_alert (or the project’s
alert/logging helper).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/functions/main/rootfs-image.sh`:
- Line 19: The call to _native_armhf_setup_binfmt_elf currently uses "|| true"
which silently swallows failures; replace that pattern with an explicit
conditional: invoke _native_armhf_setup_binfmt_elf and if it returns non-zero
run a visible fallback notification (e.g. if ! _native_armhf_setup_binfmt_elf;
then display_alert "native armhf binfmt activation failed; falling back to
qemu-static"; fi) so operators see when the script falls back to qemu-static;
keep the existing non-fatal behavior but surface the failure via display_alert
(or the project’s alert/logging helper).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c0143a21-0438-46b0-9277-ecb11ae8e453

📥 Commits

Reviewing files that changed from the base of the PR and between 4998f52 and 152d4f6.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/functions/rootfs/rootfs-create.sh
  • lib/functions/rootfs/qemu-static.sh

@iav iav marked this pull request as ready for review May 10, 2026 00:57
@iav iav marked this pull request as draft May 10, 2026 01:18
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 42770de to 7b782ce Compare May 10, 2026 01:45
@github-actions github-actions Bot added size/medium PR with more then 50 and less then 250 lines and removed size/large PR with 250 lines or more labels May 10, 2026
@iav iav marked this pull request as ready for review May 10, 2026 01:54
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 10, 2026

@coderabbitai full review please.

Branch force-pushed and squashed into a single commit (7b782ceac). The userspace owner-flock coordination layer (~250 lines) is replaced with kernel flock(2) directly on /proc/sys/fs/binfmt_misc/qemu-arm. Earlier review threads on prior revisions are obsolete; please look at the current revision fresh.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 10, 2026

@iav, understood — I'll do a fresh full review of the current squashed commit (7b782ceac), treating the kernel flock(2)-on-/proc/sys/fs/binfmt_misc/qemu-arm approach as the canonical implementation and ignoring all prior revision threads.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 329-341: The early-continue when calling
_native_armhf_setup_binfmt_elf lets the "killswitch" path
(NATIVE_ARMHF_ON_ARM64=no) bypass the later disabled-state guard and skip
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target; change the logic so the
caller only continues when _native_armhf_setup_binfmt_elf reports an actual
active-native success (explicit success exit code), and when the function
returns due to the killswitch (use a distinct exit code or status), fall through
instead of continuing so the later
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target and the disabled-state
guard still run; apply the same fix to the analogous block around
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target (lines 356-371) and
ensure _native_armhf_setup_binfmt_elf and its callers agree on exit codes
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6912ee78-6640-4866-9e0d-f7c2e50ad718

📥 Commits

Reviewing files that changed from the base of the PR and between 38c583f and 7b782ce.

📒 Files selected for processing (3)
  • lib/functions/main/rootfs-image.sh
  • lib/functions/rootfs/qemu-static.sh
  • lib/functions/rootfs/rootfs-create.sh

Comment thread lib/functions/rootfs/qemu-static.sh
…u-arm

When building an armhf image on an aarch64 host that supports 32-bit ARM
execution (kernel CONFIG_COMPAT, CPU has 32-bit user mode), disable the
qemu-arm handler in /proc/sys/fs/binfmt_misc/ for the duration of the
build. The kernel's own binfmt_elf then handles 32-bit ARM ELFs natively
via CONFIG_COMPAT — typically ~12x faster than qemu-arm-static emulation
on Cortex-A53/A55/A72/A73/A76.

Continues #9284 (arm64-compat-vdso extension + custom_kernel_make_params
hook), which enabled COMPAT_VDSO in arm64 kernels.

Activation point is delayed to AFTER mmdebstrap. Its cross-arch path
extracts base packages and runs their postinst hooks, which require a
working qemu-arm registration to populate the chroot before
libc/ld-linux-armhf.so.3 exist for binfmt_elf to fall through to.
The speedup applies to all subsequent chroot operations
(install_distribution_specific, customize_image, update-initramfs).

Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms: never, disabled).

Concurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD flock on the binfmt entry itself — no userspace state, no
per-builder lockfiles, no /run/lock directory tree:

- Each builder holds LOCK_SH on /proc/sys/fs/binfmt_misc/qemu-arm via a
  long-lived fd. Kernel BSD-flock counter is the refcount; the kernel
  releases the fd on process exit (crash-safe).
- First-arrival idempotently `echo 0` to disable. Subsequent arrivals
  observe 0 and proceed without writing.
- On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on a
  fresh fd, succeeds iff zero other LOCK_SH holders. Last-out re-enables
  qemu-arm.

Trade-offs (documented):
- Prior qemu-arm state is not recorded across independent builds. Last-
  out unconditionally re-enables. Admin's pre-existing `disabled` policy
  is not preserved.
- No defense against an external agent toggling qemu-arm mid-build.
  Documented; either don't, or set NATIVE_ARMHF_ON_ARM64=no on builds
  that should not interfere.
- Cleanup ordering invariant: BSD flock is per-OFD, so a forked subshell
  inheriting our SH-fd shares the same lock entry. add_cleanup_handler
  runs in registration order; we register after the umount/SDCARD/MOUNT
  teardown handlers, so by the time we run, the docker container is
  killed and its child-tree (with our inherited fd) gone. Documented in
  the trap handler's docstring; if a future refactor breaks ordering,
  POSIX F_SETLK or explicit descendant-kill are the documented escape
  hatches.

Empirical (helios4 mvebu/armhf, BUILD_MINIMAL=yes BRANCH=edge RELEASE=noble):
- baseline (rootfs cache-miss): 60:35
- this PR (rootfs cache-miss): 19:27 (3.12x)
- this PR (rootfs cache-hit): 6:38 (~9x over expected baseline cache-hit)

Concurrency verified on Hetzner CAX21 (kernel 6.8.0-90-generic): two
independent armbian worktrees concurrent on same host, both enter native
path, kernel SH-counter holds qemu-arm disabled across both, first-
finished suppresses restore (LOCK_EX-NB blocked by other's SH), last-
finished restores qemu-arm to enabled. SIGINT mid-chroot also cleans up
correctly.

Reviewed iteratively with codex (gpt-5.5) and coderabbitai. The earlier
revision of this work coordinated via a userspace owner-flock directory
(~250 lines, depended on /run/lock being host-bind-mounted into build
containers, which Armbian docker does not do); replaced with the kernel-
flock-on-binfmt-entry approach above.

Assisted-by: Claude:claude-opus-4.7
@iav iav force-pushed the feat/native-armhf-on-arm64 branch from 7b782ce to 9d8b848 Compare May 10, 2026 03:00
…ative builder probes EX-NB before disabling qemu-arm

K-builder (NATIVE_ARMHF_ON_ARM64=no) now takes a SH-lock on
/proc/sys/fs/binfmt_misc/qemu-arm for its lifetime instead of returning
immediately. N-builder (default) probes EX-NB on a fresh fd before
acquiring its own SH; if probe fails AND qemu-arm is observably enabled
(state==1), a K-builder holds the emulation anchor and switching to
native mid-flight would corrupt their qemu-arm-static routing — refuse
with exit_with_error.

Symmetric closure of the K↔N race:
- N→K: K sees qemu-arm=0 → fail-fast via the existing prepare_host
  guard (addressed in 9d8b848).
- K→N: N sees EX-NB fail + state=1 → fail-fast with 'concurrent build
  with NATIVE_ARMHF_ON_ARM64=no holds emulation lock' (this commit).

Verified end-to-end on Hetzner CAX21 (4-core Ampere Altra, Ubuntu 6.8):
- N first, K second: K exits 8s with 'qemu-arm globally disabled by
  another concurrent build'.
- K first, N second: N exits 12s with 'cannot enable native armhf:
  concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock'.
- Cleanup of both K (release SH) and N (last-out restore echo 1) leaves
  qemu-arm in 'enabled' state.

Assisted-by: Claude:claude-opus-4.7
@github-actions github-actions Bot added size/large PR with 250 lines or more and removed size/medium PR with more then 50 and less then 250 lines labels May 10, 2026
@iav iav marked this pull request as draft May 13, 2026 00:08
iav added a commit to iav/armbian that referenced this pull request May 13, 2026
…r verifies state=1 after SH-NB

Addresses Codex P2 review feedback on PR #113 / armbian#9769:

- Race in N-builder entry (line:239): EX-NB → release → re-take SH split
  let a K-builder slip in between observation and the state-check write,
  silently disabling qemu-arm while K relied on it. Restructure to hold
  EX during state check + echo 0, then atomically downgrade EX→SH on the
  same fd via Linux flock(2)'s flc_lock transition. SH waiters are
  granted only after our SH is in place and qemu-arm is already 0.

- K-builder symmetric check: after acquiring the emulation-mode SH-NB,
  verify state=1; if 0, concurrent N-builders have disabled qemu-arm
  and the killswitch cannot honor its contract — bail with a clear
  error rather than running a broken anchor.

- Cleanup-ordering invariant comment was inverted. add_cleanup_handler
  PREPENDS, so later registrations run FIRST. The invariant survives
  because setup is called before mount_chroot in both call sites; the
  comment now states the actual mechanism and documents the failure
  mode if a future call site inverts the order.

Assisted-by: Claude:claude-opus-4.7
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 13, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

iav added a commit to iav/armbian that referenced this pull request May 13, 2026
… + clarify ordering invariant

Addresses codex review feedback on PR #113 / armbian#9769.

N-builder entry race: EX-NB → release → re-take SH split let a K-builder
slip in between observation and the state-check write, silently disabling
qemu-arm while K relied on it. Hold EX during state check + echo 0, then
atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock
transition. SH waiters are granted only after our SH is in place and
qemu-arm is already 0.

K-builder symmetric state check: after acquiring the emulation-mode
SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm
and the killswitch cannot honor its contract — bail with a clear error
rather than running a broken anchor.

Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing
state=0 and acquiring SH the last live N-builder may release SH and run
the restorer (echo 1). Recheck state after SH and bail back to emulation
if state has flipped — otherwise we'd skip qemu deploy while binfmt
actually routes to qemu, and chroot exec fails because the qemu binary
is not in the chroot.

Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful
for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates
above killswitch evaluation so unrelated builds (amd64/arm64 targets,
x86 hosts) skip qemu-arm entirely.

Early call site removed: prepare_host_binfmt_qemu_cross previously called
_native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap
phase without qemu registration on a chroot that has no libc/ld-linux
yet — armhf maintainer-script exec fails. Native activation stays where
it belongs: post-mmdebstrap call sites in rootfs-create.sh and
rootfs-image.sh.

Cleanup-ordering invariant comment was inverted. add_cleanup_handler
PREPENDS, so later registrations run FIRST. The invariant survives
because setup is called before mount_chroot in both call sites; the
comment now states the actual mechanism and documents the failure mode
if a future call site inverts the order.

Assisted-by: Claude:claude-opus-4.7
@iav iav marked this pull request as ready for review May 13, 2026 02:25
iav added a commit to iav/armbian that referenced this pull request May 13, 2026
… + clarify ordering invariant

Addresses codex review feedback on PR #113 / armbian#9769.

N-builder entry race: EX-NB → release → re-take SH split let a K-builder
slip in between observation and the state-check write, silently disabling
qemu-arm while K relied on it. Hold EX during state check + echo 0, then
atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock
transition. SH waiters are granted only after our SH is in place and
qemu-arm is already 0.

K-builder symmetric state check (success path): after acquiring the
emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have
disabled qemu-arm and the killswitch cannot honor its contract — bail
with a clear error rather than running a broken anchor.

K-builder SH-NB → blocking SH-with-timeout (failure path): the old
non-blocking flock -s -n would fail when an N-builder briefly held EX
(probe / disable / EX→SH downgrade window). Falling through to qemu
emulation without holding the SH anchor let the peer N complete its
transition with qemu-arm=0, breaking K's chroot exec routing. Now block
up to 30s; if SH lands, re-check state — anchor on state=1, exit_with_error
on state=0. Sub-millisecond peer transitions resolve well within the
window; persistent contention falls back without anchor (preserves old
behavior for degenerate cases).

Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing
state=0 and acquiring SH the last live N-builder may release SH and run
the restorer (echo 1). Recheck state after SH and bail back to emulation
if state has flipped — otherwise we'd skip qemu deploy while binfmt
actually routes to qemu, and chroot exec fails because the qemu binary
is not in the chroot.

Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful
for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates
above killswitch evaluation so unrelated builds (amd64/arm64 targets,
x86 hosts) skip qemu-arm entirely.

Early call site removed: prepare_host_binfmt_qemu_cross previously called
_native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap
phase without qemu registration on a chroot that has no libc/ld-linux
yet — armhf maintainer-script exec fails. Native activation stays where
it belongs: post-mmdebstrap call sites in rootfs-create.sh and
rootfs-image.sh.

Cleanup-ordering invariant comment was inverted. add_cleanup_handler
PREPENDS, so later registrations run FIRST. The invariant survives
because setup is called before mount_chroot in both call sites; the
comment now states the actual mechanism and documents the failure mode
if a future call site inverts the order.

Assisted-by: Claude:claude-opus-4.7
@armbian armbian deleted a comment from coderabbitai Bot May 13, 2026
@iav
Copy link
Copy Markdown
Contributor Author

iav commented May 13, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

iav added a commit to iav/armbian that referenced this pull request May 13, 2026
…ble of qemu-arm

Opt-in extension that disables qemu-arm in binfmt_misc for the duration
of an armhf build on an aarch64 host. The kernel's binfmt_elf path then
runs 32-bit ARM ELF binaries natively via CONFIG_COMPAT, removing the
~10× qemu-user-static overhead from mmdebstrap, chroot apt-get, dpkg
--configure, and customize_image stages.

Activate with ENABLE_EXTENSIONS=native-armhf-on-arm64.

Scope:
  - host_pre_docker_launch hook: self-gates on ARCH=armhf and aarch64
    host arch; refuses to run on hosts where qemu-arm is unregistered;
    probes arch-test armhf POST-disable with automatic rollback if the
    kernel cannot run armhf natively (no CONFIG_COMPAT);
    add_cleanup_handler on success so build exit restores qemu-arm.
  - No core changes. No global-state coordination claims.
  - Restrictions documented in the file header: NO CONCURRENT ARMBIAN
    BUILDS on the same host while the extension is active.

Why an extension (and not a core feature):
  - The acceleration only applies to armhf-on-aarch64; no benefit for
    other ARCH/host combos.
  - Concurrent-safe coordination in core would require kernel-level
    flock dance on /proc/sys/fs/binfmt_misc/qemu-arm (see PR armbian#9769),
    which is large, hard to review and to test; extension scope makes
    the operator the source of truth for host serialization and lets
    the implementation stay ~80 lines.
  - Cleanup is best-effort: a SIGKILL'd build leaves qemu-arm disabled.
    Documented in the header with manual restore instructions.

Validated:
  - aarch64 Ubuntu Noble kernel 6.8.x (Hetzner CAX), no COMPAT_VDSO
    needed in stock builds.

Assisted-by: Claude:claude-opus-4.7
iav added a commit to iav/armbian that referenced this pull request May 13, 2026
…e hook

Opt-in extension that disables qemu-arm in binfmt_misc for the duration
of an armhf build on an aarch64 host. The kernel's binfmt_elf path then
runs 32-bit ARM ELF binaries natively via CONFIG_COMPAT, removing the
~10× qemu-user-static overhead from mmdebstrap, chroot apt-get, dpkg
--configure, and customize_image stages.

Activate with ENABLE_EXTENSIONS=native-armhf-on-arm64.

Core change (lib/functions/host/prepare-host.sh):
  Add a new extension hook `host_binfmt_ready`, fired from
  `prepare_host_noninteractive` immediately after
  `prepare_host_binfmt_qemu`. At this point the binfmt_misc registrations
  for the target architecture are in place, which is the right window
  for any extension that wants to mutate them. Five lines, no behavioral
  change for builds that don't use the hook.

Extension (extensions/native-armhf-on-arm64.sh):
  - Gate: ARCH=armhf and host arch aarch64 — silent no-op otherwise.
  - Verify qemu-arm is registered and enabled (it always is after
    prepare_host_binfmt_qemu — otherwise abort with a clear message).
  - Disable qemu-arm via /proc/sys/fs/binfmt_misc/qemu-arm.
  - Probe-with-rollback: exec /usr/arm-linux-gnueabihf/lib/ld-linux-armhf.so.3
    --help to verify the kernel can still run armhf natively via
    CONFIG_COMPAT. If it can't (no COMPAT), re-enable qemu-arm and
    abort with a clear message. We use a direct ld-linux exec rather
    than `arch-test armhf` because the latter is unreliable on some
    aarch64 hosts (e.g. Hetzner Ampere CAX, Ubuntu Noble 6.8) — it
    returns failure even when CONFIG_COMPAT is fully functional.
  - add_cleanup_handler on success: re-enable qemu-arm on build exit
    (best-effort; if the build is SIGKILL'd, qemu-arm stays disabled;
    documented manual restore in the header).

Restrictions (header):
  - NO CONCURRENT ARMBIAN BUILDS on the same host while this extension
    is active. The disable mutates global kernel state. The operator
    owns the host while this extension runs — no flock dance, no
    concurrency claims. Matches the safety level of baseline armbian
    (which also doesn't synchronize binfmt_misc mutations across
    concurrent builds).
  - The in-core flock-based variant at PR armbian#9769 covers concurrent-safe
    use; this extension is the opt-in lightweight alternative.

Validated on Hetzner CAX21 (Ampere Altra, Ubuntu Noble 6.8.0-90,
CONFIG_COMPAT=y, no COMPAT_VDSO): ld-linux-armhf exec succeeds with
qemu-arm disabled.

Assisted-by: Claude:claude-opus-4.7
iav added a commit to iav/armbian that referenced this pull request May 13, 2026
…e hook

Opt-in extension that disables qemu-arm in binfmt_misc for the duration
of an armhf build on an aarch64 host. The kernel's binfmt_elf path then
runs 32-bit ARM ELF binaries natively via CONFIG_COMPAT, removing the
~10× qemu-user-static overhead from mmdebstrap, chroot apt-get, dpkg
--configure, and customize_image stages.

Activate with ENABLE_EXTENSIONS=native-armhf-on-arm64.

Core change (lib/functions/host/prepare-host.sh):
  Add a new extension hook `host_binfmt_ready`, fired from
  `prepare_host_noninteractive` immediately after
  `prepare_host_binfmt_qemu`. At this point the binfmt_misc registrations
  for the target architecture are in place, which is the right window
  for any extension that wants to mutate them. Five lines, no behavioral
  change for builds that don't use the hook.

Extension (extensions/native-armhf-on-arm64.sh):
  - Gate: ARCH=armhf and host arch aarch64 — silent no-op otherwise.
  - Verify qemu-arm is registered and enabled (it always is after
    prepare_host_binfmt_qemu — otherwise abort with a clear message).
  - Disable qemu-arm via /proc/sys/fs/binfmt_misc/qemu-arm.
  - Probe-with-rollback: exec /usr/arm-linux-gnueabihf/lib/ld-linux-armhf.so.3
    --help to verify the kernel can still run armhf natively via
    CONFIG_COMPAT. If it can't (no COMPAT), re-enable qemu-arm and
    abort with a clear message. We use a direct ld-linux exec rather
    than `arch-test armhf` because the latter is unreliable on some
    aarch64 hosts (e.g. Hetzner Ampere CAX, Ubuntu Noble 6.8) — it
    returns failure even when CONFIG_COMPAT is fully functional.
  - add_cleanup_handler on success: re-enable qemu-arm on build exit
    (best-effort; if the build is SIGKILL'd, qemu-arm stays disabled;
    documented manual restore in the header).

Restrictions (header):
  - NO CONCURRENT ARMBIAN BUILDS on the same host while this extension
    is active. The disable mutates global kernel state. The operator
    owns the host while this extension runs — no flock dance, no
    concurrency claims. Matches the safety level of baseline armbian
    (which also doesn't synchronize binfmt_misc mutations across
    concurrent builds).
  - The in-core flock-based variant at PR armbian#9769 covers concurrent-safe
    use; this extension is the opt-in lightweight alternative.

Validated on Hetzner CAX21 (Ampere Altra, Ubuntu Noble 6.8.0-90,
CONFIG_COMPAT=y, no COMPAT_VDSO): ld-linux-armhf exec succeeds with
qemu-arm disabled.

Assisted-by: Claude:claude-opus-4.7
@iav iav marked this pull request as draft May 14, 2026 00:14
@iav iav marked this pull request as draft May 14, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

05 Milestone: Second quarter release Framework Framework components Needs review Seeking for review size/large PR with 250 lines or more

Development

Successfully merging this pull request may close these issues.

2 participants