feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769
feat(rootfs): native armhf on aarch64 host via runtime-disable of qemu-arm#9769iav wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughAdds coordinated host binfmt_misc handling to prefer kernel-native 32-bit ARM ELF on aarch64 hosts, conditional qemu deploy/undeploy behavior, and invokes the setup during rootfs creation and image build; setup failures are ignored. ChangesNative armhf via binfmt_elf Mode
Sequence Diagram(s)sequenceDiagram
participant Build as Build Flow
participant Setup as _native_armhf_setup_binfmt_elf()
participant Binfmt as /proc/sys/fs/binfmt_misc
participant Trap as Cleanup Handler
participant Deploy as deploy_qemu_binary_to_chroot()
Build->>Setup: invoke (post-rootfs extraction)
Setup->>Binfmt: read qemu-arm entry/state
alt qemu-arm enabled
Setup->>Binfmt: write disable qemu-arm
Setup->>Trap: register restore handler
Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
else qemu-arm absent_or_disabled
Setup-->>Build: export ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes
end
Build->>Deploy: proceed to deploy step
Deploy->>Deploy: check ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF
alt native mode set
Deploy-->>Build: early return (skip qemu copy)
else
Deploy->>Build: copy qemu-arm-static into chroot
end
Build-->>Trap: on exit/signal
Trap->>Binfmt: write restore qemu-arm state
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early return in the qemu-static check (when
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF == "yes") prevents the matching undeploy
from running later, leaking host qemu into cached rootfs; instead of returning
from the function, set a clear marker variable (e.g.,
ARMBIAN_QEMU_DEPLOY_SKIPPED=yes or ARMBIAN_QEMU_DEPLOYED=no) after calling
display_alert and allow the function to exit normally so the caller
(rootfs-create.sh) can always run the undeploy path and check that marker;
update the undeploy logic to look for that same marker
(ARMBIAN_QEMU_DEPLOY_SKIPPED or ARMBIAN_QEMU_DEPLOYED) so it only skips removal
when deploy truly never ran.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eba7fdae-710a-42e6-bcb9-0d41993767df
📒 Files selected for processing (4)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.shlib/library-functions.sh
502287f to
6f7903b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
lib/functions/rootfs/qemu-static.sh (1)
20-25:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDeploy/undeploy asymmetry can leak host
qemu-arm-staticinto the rootfs cache.The flag check is symmetric inside this file, but at the call sites (
rootfs-create.sh,rootfs-image.shper the summary),deploy_qemu_binary_to_chrootruns before_native_armhf_setup_binfmt_elfflipsARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes, whileundeploy_qemu_binary_from_chrootruns after. In that ordering the binary gets copied into the chroot, but the matching removal is suppressed by the new early-return at Line 56. The hostqemu-arm-staticthen persists into the rootfs cache and into final armhf images on subsequent cache-hit builds.Track per-(caller, chroot_target) deployment state and only skip undeploy when deploy actually skipped, rather than gating both on the flag alone.
🛠️ Suggested fix
+declare -gA armbian_qemu_deployed_to_chroot=() + function deploy_qemu_binary_to_chroot() { declare chroot_target="${1}" caller="${2}" @@ if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then display_alert "Native armhf via binfmt_elf" "skipping qemu binary deployment during ${caller}" "info" return 0 fi @@ display_alert "Deploying qemu-user-static binary to chroot" "${QEMU_BINARY} during ${caller}" "info" run_host_command_logged cp -pv "${src_host}" "${dst_target}" + armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]=yes return 0 } @@ function undeploy_qemu_binary_from_chroot() { @@ - if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" ]]; then + if [[ "${ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF:-no}" == "yes" \ + && "${armbian_qemu_deployed_to_chroot["${caller}:${chroot_target}"]:-no}" != "yes" ]]; then display_alert "Native armhf via binfmt_elf" "no qemu binary to remove during ${caller}" "debug" return 0 fiAlso applies to: 55-59
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/functions/rootfs/qemu-static.sh` around lines 20 - 25, The early-return based solely on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry across separate invocations; change the logic so that when deploy_qemu_binary_to_chroot skips copying due to the flag it records that decision with a per-(caller,chroot_target) marker (e.g., a temp file named with ${caller} and the chroot target inside the rootfs cache or a shared temp dir), and then have undeploy_qemu_binary_from_chroot check that marker and only skip removal when the deploy previously skipped for that same (caller,chroot_target); ensure you still clean up the marker when an actual copy was performed and when removal runs so state cannot leak into later builds (references: functions deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 169-188: Move the add_cleanup_handler call so it runs before
writing "0" to /proc/sys/fs/binfmt_misc/qemu-arm (register the trap before the
destructive action) and update trap_handler_native_armhf_restore_qemu_arm to not
early-return based on ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF; instead key the
cleanup on the actual binfmt_misc state (check [[ -e
/proc/sys/fs/binfmt_misc/qemu-arm ]] and attempt echo 1 >
/proc/sys/fs/binfmt_misc/qemu-arm || true), so a partially completed setup or an
interrupt between the disable and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF assignment
will still re-enable qemu-arm.
---
Duplicate comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 20-25: The early-return based solely on
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF in
deploy_qemu_binary_to_chroot/undeploy_qemu_binary_from_chroot causes asymmetry
across separate invocations; change the logic so that when
deploy_qemu_binary_to_chroot skips copying due to the flag it records that
decision with a per-(caller,chroot_target) marker (e.g., a temp file named with
${caller} and the chroot target inside the rootfs cache or a shared temp dir),
and then have undeploy_qemu_binary_from_chroot check that marker and only skip
removal when the deploy previously skipped for that same (caller,chroot_target);
ensure you still clean up the marker when an actual copy was performed and when
removal runs so state cannot leak into later builds (references: functions
deploy_qemu_binary_to_chroot, undeploy_qemu_binary_from_chroot and variable
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e697b537-3deb-4c61-94a1-02ef9c76d299
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/functions/main/rootfs-image.sh
- lib/functions/rootfs/rootfs-create.sh
eab2ab1 to
d990278
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/functions/rootfs/qemu-static.sh (1)
131-134: 💤 Low valueOptional: case-insensitive killswitch matching.
NATIVE_ARMHF_ON_ARM64=NO(orDisabled,False, etc.) would slip past the killswitch and proceed to detection. This is a common ergonomic gotcha when users export env vars from CI configs. Low priority since the function self-aborts on the subsequent guards anyway, but a one-liner makes the contract explicit.♻️ Optional refactor
- # Killswitch — accept no/never/disabled as synonyms; bail before any detection runs. - case "${NATIVE_ARMHF_ON_ARM64:-auto}" in - no | never | disabled) return 1 ;; - esac + # Killswitch — accept no/never/disabled (any case) as synonyms; bail before any detection runs. + case "${NATIVE_ARMHF_ON_ARM64:-auto,,}" in + no | never | disabled) return 1 ;; + esac🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/functions/rootfs/qemu-static.sh` around lines 131 - 134, The killswitch case for NATIVE_ARMHF_ON_ARM64 is currently case-sensitive and will miss values like "NO" or "Disabled"; update the case statement around the NATIVE_ARMHF_ON_ARM64 check to perform case-insensitive matching (e.g., convert the value to lowercase with parameter expansion like "${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the patterns to bracketed forms such as [Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee] to accept common synonyms) so that the return 1 kill path in that branch reliably triggers for uppercase/mixed-case inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 131-134: The killswitch case for NATIVE_ARMHF_ON_ARM64 is
currently case-sensitive and will miss values like "NO" or "Disabled"; update
the case statement around the NATIVE_ARMHF_ON_ARM64 check to perform
case-insensitive matching (e.g., convert the value to lowercase with parameter
expansion like "${NATIVE_ARMHF_ON_ARM64,,}" before the case or expand the
patterns to bracketed forms such as
[Nn][Oo]|[Nn][Ee][Vv][Ee][Rr]|[Dd][Ii][Ss][Aa][Bb][Ll][Ee][Dd]|[Ff][Aa][Ll][Ss][Ee]
to accept common synonyms) so that the return 1 kill path in that branch
reliably triggers for uppercase/mixed-case inputs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ff5855c0-1c00-4b4b-8c4e-dc619c12d9a0
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/functions/main/rootfs-image.sh
- lib/functions/rootfs/rootfs-create.sh
|
Re: nitpick on |
d990278 to
15789ee
Compare
|
Who use arm64 builders — try it! |
|
aarch64 host (CIX). Always fails:
|
15789ee to
df7aaf9
Compare
|
@igorpecovnik thanks for the report. Root cause: Fix: arch-test moved to AFTER the binfmt_misc disable, so it reflects what the chroot exec will actually face. On detection failure we restore qemu-arm immediately and fall back to qemu-static emulation transparently. After this fix lands on your builder, your build will work as before — slowly, via qemu-arm-static, no regression. To get the speedup, build your builder host's kernel with the |
…lel builds
PR review by codex (gpt-5.5) flagged the missing synchronisation around the
shared /proc/sys/fs/binfmt_misc/qemu-arm flag as the main blocker for shipping
NATIVE_ARMHF_ON_ARM64 default-on. The previous refcount-less code disabled
qemu-arm globally in setup and unconditionally re-enabled it in cleanup —
two parallel builds on the same host could:
* have build A disable qemu-arm and register restore;
* have build B observe 'already disabled', set
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes and skip copying
qemu-arm-static into its own chroot;
* have build A finish, cleanup blindly write 1 → qemu-arm enabled;
* have build B continue with no qemu binary in chroot but
qemu-arm globally registered — runtime failure.
Replace with a per-host owner-flock pattern (codex-recommended; refcount/
pidlist breaks under PID namespaces and PID reuse, kernel fd-release on
process exit gives crash safety for free):
* /run/lock/armbian-native-armhf/owners/<ARMBIAN_BUILD_UUID>.lock —
one file per builder, long-lived flock held for the whole native-mode
window. Closing the fd (process exit, normal or crash) releases the
lock; another builder probing it with non-blocking flock detects the
death and prunes the file.
* control.lock — short-lived flock around prune+count+state-write so
first/last-owner decisions stay consistent under concurrency.
* Live owner count = number of owner files we can NOT acquire
non-blocking. Refcount is not used as source of truth.
* First owner records prior qemu-arm state (1 / 0 / missing) into
prior-qemu-arm-state and only writes 0 if prior was 1. Subsequent
owners join without state mutation.
* Last owner restores from the saved prior — never blindly writes 1.
If prior was already disabled or the handler was missing, leave it
alone. State divergence (admin or another tool changed qemu-arm
during the active window) emits a warning, not a fight.
Per-host scope by design — binfmt_misc is per-kernel state, so cross-host
NFS coordination would only serialise independent kernels for no benefit.
The lockdir lives under /run/lock so it is local-to-host and host-shared
(the Armbian build container bind-mounts /run from the host, satisfying
the requirement). Documented in the function header comment.
Other review items addressed:
* The 'caller sets ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF=yes' comment was
inaccurate (the function sets it itself) — fixed.
* Function header rewritten to spell out the concurrency model and
Docker requirement explicitly.
Reported-by: codex (PR #9769 review-helper)
Assisted-by: Claude:claude-opus-4.7
Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
…9f27 Codex review of the previous lock-pattern commit (5359f27) found three real holes: 1. trap_handler released the owner lock BEFORE taking the control lock. Between fd close and flock(control) a fresh builder could enter setup, prune+count and see live_count == 1 (only itself), then observe the now-disabled qemu-arm and write '0' as its prior_state. Our subsequent restore would read that overwritten state and never re-enable qemu-arm — turning a coordinated handoff into permanent disablement. Fix: take control.lock first, release owner UNDER control, then count and restore. The whole transition is now atomic. 2. The post-disable arch-test failure path released the owner lock manually and then called the trap handler. The handler's first guard ([[ -n ${_native_armhf_owner_fd:-} ]] || return 0) saw the already-cleared global and exited without restore — leaving qemu-arm disabled and the state file leaking. Fix: drop control.lock and let the handler do the release+count+restore under its own control lock, matching the normal cleanup path. 3. Stale active window after process kill (SIGKILL / OOM / kill -9 between echo 0 and trap firing): kernel releases the fd so the owner lock is gone, but the state file remains. The next first-owner would observe the leftover-disabled qemu-arm and record disabled as its new prior_state — clean shutdown then never re-enables. Fix: in the first-owner branch, before saving prior_state, check for an existing state file. If found and prior was '1', reconcile qemu-arm back to enabled and remove the stale file before observing our own prior_state. Crash recovery is now self-healing on the next build. Same Edit additionally cleans up the disable-failure case: when 'echo 0 > qemu-arm' itself fails (no CAP_SYS_ADMIN), no state actually changed, so leave state file and owner lock alone — the cleanup handler will observe current==prior==1 and no-op the restore. The previous code released owner outside control and rm-ed the state file, both of which introduced the same race the trap_handler change closes elsewhere. Reported-by: codex (PR #9769 lock-impl review-helper) Assisted-by: Claude:claude-opus-4.7 Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
…3d4c8cd Codex re-review of fix2 (commit 3d4c8cd) confirmed the original three races are properly closed but identified two new ones, both producing the same class of runtime failure: a parallel builder ends up running chroot exec against globally-enabled qemu-arm with no qemu binary inside its chroot. 1. Disable-failure path left a live owner. When 'echo 0 > qemu-arm' itself fails (no CAP_SYS_ADMIN), fix2 chose to leave the owner lock and the prior-state file in place, on the theory that the cleanup handler would no-op the restore at exit. But the owner stays counted as live for the entire build duration, so a concurrent builder enters its own setup, sees live_count > 1, takes the join branch, skips qemu binary deployment, and fails at chroot exec time. Fix: a failed disable now performs a full abort under the held control lock — rm state file, release owner, drop control, return 1. The aborted setup leaves no observable native window for anyone to join. 2. Join path did not verify the actual kernel state. fix2 trusted that live_count > 1 implies qemu-arm is disabled. But an external agent (admin running 'update-binfmts --enable qemu-arm', a stray service) can flip it back on during the active window. Our pre-flight arch-test then passes as a false positive — qemu interprets the ARM stub — and we skip qemu binary deployment, hitting the same runtime failure as case 1. Fix: in the join branch, observe the current qemu-arm state explicitly. If it is enabled, this is not a normal join — abort to qemu fallback with a warning naming the external mutator. This is a one-shot invariant check at join time, not a periodic re-verify (which codex previously argued against). Reported-by: codex (PR #9769 lock-impl re-review) Assisted-by: Claude:claude-opus-4.7 Signed-off-by: Igor Velkov <325961+iav@users.noreply.github.com>
3d4c8cd to
4998f52
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 324-331: The else branch that runs when live_count > 1 must not
assume qemu-arm is already disabled; before registering the join behavior
(trap_handler_native_armhf_restore_qemu_arm, display_alert and setting
ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF), read the actual qemu-arm state (the same
state/sysfs/binfmt path used when writing "0"/"1") and verify it is "0"
(disabled); if it is not "0", fail fast (error out) rather than joining, so the
subsequent arch-test armhf and ARMBIAN_NATIVE_ARMHF_VIA_BINFMT_ELF logic do not
incorrectly skip deploying qemu-arm-static. Ensure you reference the same state
file/variable used by the disable code path and keep the cleanup/restore handler
only for true joiners.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6b9577ae-7b84-40f0-972d-8e8b8d70ef1c
📒 Files selected for processing (1)
lib/functions/rootfs/qemu-static.sh
4998f52 to
152d4f6
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/functions/main/rootfs-image.sh (1)
19-19: ⚡ Quick winMake the non-fatal fallback explicit instead of silently swallowing setup failure.
At Line 19,
|| truepreserves behavior, but it hides when native armhf activation fails. Please wrap this in an explicitif ! ...; then display_alert ...; fiso operators can see when the build falls back to qemu-static.Suggested change
- _native_armhf_setup_binfmt_elf || true + if ! _native_armhf_setup_binfmt_elf; then + display_alert "Native armhf via binfmt_elf unavailable" "Falling back to qemu-static path" "warn" + fiBased on learnings: In the Armbian build framework, scripts run with
set -e, so masked failures should be explicitly handled and not silently ignored.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/functions/main/rootfs-image.sh` at line 19, The call to _native_armhf_setup_binfmt_elf currently uses "|| true" which silently swallows failures; replace that pattern with an explicit conditional: invoke _native_armhf_setup_binfmt_elf and if it returns non-zero run a visible fallback notification (e.g. if ! _native_armhf_setup_binfmt_elf; then display_alert "native armhf binfmt activation failed; falling back to qemu-static"; fi) so operators see when the script falls back to qemu-static; keep the existing non-fatal behavior but surface the failure via display_alert (or the project’s alert/logging helper).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/functions/main/rootfs-image.sh`:
- Line 19: The call to _native_armhf_setup_binfmt_elf currently uses "|| true"
which silently swallows failures; replace that pattern with an explicit
conditional: invoke _native_armhf_setup_binfmt_elf and if it returns non-zero
run a visible fallback notification (e.g. if ! _native_armhf_setup_binfmt_elf;
then display_alert "native armhf binfmt activation failed; falling back to
qemu-static"; fi) so operators see when the script falls back to qemu-static;
keep the existing non-fatal behavior but surface the failure via display_alert
(or the project’s alert/logging helper).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c0143a21-0438-46b0-9277-ecb11ae8e453
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/functions/rootfs/rootfs-create.sh
- lib/functions/rootfs/qemu-static.sh
42770de to
7b782ce
Compare
|
@coderabbitai full review please. Branch force-pushed and squashed into a single commit ( |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/functions/rootfs/qemu-static.sh`:
- Around line 329-341: The early-continue when calling
_native_armhf_setup_binfmt_elf lets the "killswitch" path
(NATIVE_ARMHF_ON_ARM64=no) bypass the later disabled-state guard and skip
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target; change the logic so the
caller only continues when _native_armhf_setup_binfmt_elf reports an actual
active-native success (explicit success exit code), and when the function
returns due to the killswitch (use a distinct exit code or status), fall through
instead of continuing so the later
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target and the disabled-state
guard still run; apply the same fix to the analogous block around
prepare_host_binfmt_qemu_cross_arm64_host_armhf_target (lines 356-371) and
ensure _native_armhf_setup_binfmt_elf and its callers agree on exit codes
semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6912ee78-6640-4866-9e0d-f7c2e50ad718
📒 Files selected for processing (3)
lib/functions/main/rootfs-image.shlib/functions/rootfs/qemu-static.shlib/functions/rootfs/rootfs-create.sh
…u-arm When building an armhf image on an aarch64 host that supports 32-bit ARM execution (kernel CONFIG_COMPAT, CPU has 32-bit user mode), disable the qemu-arm handler in /proc/sys/fs/binfmt_misc/ for the duration of the build. The kernel's own binfmt_elf then handles 32-bit ARM ELFs natively via CONFIG_COMPAT — typically ~12x faster than qemu-arm-static emulation on Cortex-A53/A55/A72/A73/A76. Continues #9284 (arm64-compat-vdso extension + custom_kernel_make_params hook), which enabled COMPAT_VDSO in arm64 kernels. Activation point is delayed to AFTER mmdebstrap. Its cross-arch path extracts base packages and runs their postinst hooks, which require a working qemu-arm registration to populate the chroot before libc/ld-linux-armhf.so.3 exist for binfmt_elf to fall through to. The speedup applies to all subsequent chroot operations (install_distribution_specific, customize_image, update-initramfs). Killswitch: NATIVE_ARMHF_ON_ARM64=no (synonyms: never, disabled). Concurrent armbian builds on the same host kernel are coordinated purely via kernel BSD flock on the binfmt entry itself — no userspace state, no per-builder lockfiles, no /run/lock directory tree: - Each builder holds LOCK_SH on /proc/sys/fs/binfmt_misc/qemu-arm via a long-lived fd. Kernel BSD-flock counter is the refcount; the kernel releases the fd on process exit (crash-safe). - First-arrival idempotently `echo 0` to disable. Subsequent arrivals observe 0 and proceed without writing. - On exit, release LOCK_SH; last-out detects via LOCK_EX-LOCK_NB on a fresh fd, succeeds iff zero other LOCK_SH holders. Last-out re-enables qemu-arm. Trade-offs (documented): - Prior qemu-arm state is not recorded across independent builds. Last- out unconditionally re-enables. Admin's pre-existing `disabled` policy is not preserved. - No defense against an external agent toggling qemu-arm mid-build. Documented; either don't, or set NATIVE_ARMHF_ON_ARM64=no on builds that should not interfere. - Cleanup ordering invariant: BSD flock is per-OFD, so a forked subshell inheriting our SH-fd shares the same lock entry. add_cleanup_handler runs in registration order; we register after the umount/SDCARD/MOUNT teardown handlers, so by the time we run, the docker container is killed and its child-tree (with our inherited fd) gone. Documented in the trap handler's docstring; if a future refactor breaks ordering, POSIX F_SETLK or explicit descendant-kill are the documented escape hatches. Empirical (helios4 mvebu/armhf, BUILD_MINIMAL=yes BRANCH=edge RELEASE=noble): - baseline (rootfs cache-miss): 60:35 - this PR (rootfs cache-miss): 19:27 (3.12x) - this PR (rootfs cache-hit): 6:38 (~9x over expected baseline cache-hit) Concurrency verified on Hetzner CAX21 (kernel 6.8.0-90-generic): two independent armbian worktrees concurrent on same host, both enter native path, kernel SH-counter holds qemu-arm disabled across both, first- finished suppresses restore (LOCK_EX-NB blocked by other's SH), last- finished restores qemu-arm to enabled. SIGINT mid-chroot also cleans up correctly. Reviewed iteratively with codex (gpt-5.5) and coderabbitai. The earlier revision of this work coordinated via a userspace owner-flock directory (~250 lines, depended on /run/lock being host-bind-mounted into build containers, which Armbian docker does not do); replaced with the kernel- flock-on-binfmt-entry approach above. Assisted-by: Claude:claude-opus-4.7
7b782ce to
9d8b848
Compare
…ative builder probes EX-NB before disabling qemu-arm K-builder (NATIVE_ARMHF_ON_ARM64=no) now takes a SH-lock on /proc/sys/fs/binfmt_misc/qemu-arm for its lifetime instead of returning immediately. N-builder (default) probes EX-NB on a fresh fd before acquiring its own SH; if probe fails AND qemu-arm is observably enabled (state==1), a K-builder holds the emulation anchor and switching to native mid-flight would corrupt their qemu-arm-static routing — refuse with exit_with_error. Symmetric closure of the K↔N race: - N→K: K sees qemu-arm=0 → fail-fast via the existing prepare_host guard (addressed in 9d8b848). - K→N: N sees EX-NB fail + state=1 → fail-fast with 'concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock' (this commit). Verified end-to-end on Hetzner CAX21 (4-core Ampere Altra, Ubuntu 6.8): - N first, K second: K exits 8s with 'qemu-arm globally disabled by another concurrent build'. - K first, N second: N exits 12s with 'cannot enable native armhf: concurrent build with NATIVE_ARMHF_ON_ARM64=no holds emulation lock'. - Cleanup of both K (release SH) and N (last-out restore echo 1) leaves qemu-arm in 'enabled' state. Assisted-by: Claude:claude-opus-4.7
…r verifies state=1 after SH-NB Addresses Codex P2 review feedback on PR #113 / armbian#9769: - Race in N-builder entry (line:239): EX-NB → release → re-take SH split let a K-builder slip in between observation and the state-check write, silently disabling qemu-arm while K relied on it. Restructure to hold EX during state check + echo 0, then atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock transition. SH waiters are granted only after our SH is in place and qemu-arm is already 0. - K-builder symmetric check: after acquiring the emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm and the killswitch cannot honor its contract — bail with a clear error rather than running a broken anchor. - Cleanup-ordering invariant comment was inverted. add_cleanup_handler PREPENDS, so later registrations run FIRST. The invariant survives because setup is called before mount_chroot in both call sites; the comment now states the actual mechanism and documents the failure mode if a future call site inverts the order. Assisted-by: Claude:claude-opus-4.7
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
… + clarify ordering invariant Addresses codex review feedback on PR #113 / armbian#9769. N-builder entry race: EX-NB → release → re-take SH split let a K-builder slip in between observation and the state-check write, silently disabling qemu-arm while K relied on it. Hold EX during state check + echo 0, then atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock transition. SH waiters are granted only after our SH is in place and qemu-arm is already 0. K-builder symmetric state check: after acquiring the emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm and the killswitch cannot honor its contract — bail with a clear error rather than running a broken anchor. Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing state=0 and acquiring SH the last live N-builder may release SH and run the restorer (echo 1). Recheck state after SH and bail back to emulation if state has flipped — otherwise we'd skip qemu deploy while binfmt actually routes to qemu, and chroot exec fails because the qemu binary is not in the chroot. Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates above killswitch evaluation so unrelated builds (amd64/arm64 targets, x86 hosts) skip qemu-arm entirely. Early call site removed: prepare_host_binfmt_qemu_cross previously called _native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap phase without qemu registration on a chroot that has no libc/ld-linux yet — armhf maintainer-script exec fails. Native activation stays where it belongs: post-mmdebstrap call sites in rootfs-create.sh and rootfs-image.sh. Cleanup-ordering invariant comment was inverted. add_cleanup_handler PREPENDS, so later registrations run FIRST. The invariant survives because setup is called before mount_chroot in both call sites; the comment now states the actual mechanism and documents the failure mode if a future call site inverts the order. Assisted-by: Claude:claude-opus-4.7
… + clarify ordering invariant Addresses codex review feedback on PR #113 / armbian#9769. N-builder entry race: EX-NB → release → re-take SH split let a K-builder slip in between observation and the state-check write, silently disabling qemu-arm while K relied on it. Hold EX during state check + echo 0, then atomically downgrade EX→SH on the same fd via Linux flock(2)'s flc_lock transition. SH waiters are granted only after our SH is in place and qemu-arm is already 0. K-builder symmetric state check (success path): after acquiring the emulation-mode SH-NB, verify state=1; if 0, concurrent N-builders have disabled qemu-arm and the killswitch cannot honor its contract — bail with a clear error rather than running a broken anchor. K-builder SH-NB → blocking SH-with-timeout (failure path): the old non-blocking flock -s -n would fail when an N-builder briefly held EX (probe / disable / EX→SH downgrade window). Falling through to qemu emulation without holding the SH anchor let the peer N complete its transition with qemu-arm=0, breaking K's chroot exec routing. Now block up to 30s; if SH lands, re-check state — anchor on state=1, exit_with_error on state=0. Sub-millisecond peer transitions resolve well within the window; persistent contention falls back without anchor (preserves old behavior for degenerate cases). Joiner state-recheck: in the state=0 / EX-NB-fail path, between observing state=0 and acquiring SH the last live N-builder may release SH and run the restorer (echo 1). Recheck state after SH and bail back to emulation if state has flipped — otherwise we'd skip qemu deploy while binfmt actually routes to qemu, and chroot exec fails because the qemu binary is not in the chroot. Killswitch gated by ARCH/host: NATIVE_ARMHF_ON_ARM64=no only meaningful for armhf builds on aarch64 hosts. Moved idempotent + ARCH/host gates above killswitch evaluation so unrelated builds (amd64/arm64 targets, x86 hosts) skip qemu-arm entirely. Early call site removed: prepare_host_binfmt_qemu_cross previously called _native_armhf_setup_binfmt_elf BEFORE mmdebstrap, leaving the bootstrap phase without qemu registration on a chroot that has no libc/ld-linux yet — armhf maintainer-script exec fails. Native activation stays where it belongs: post-mmdebstrap call sites in rootfs-create.sh and rootfs-image.sh. Cleanup-ordering invariant comment was inverted. add_cleanup_handler PREPENDS, so later registrations run FIRST. The invariant survives because setup is called before mount_chroot in both call sites; the comment now states the actual mechanism and documents the failure mode if a future call site inverts the order. Assisted-by: Claude:claude-opus-4.7
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
…ble of qemu-arm
Opt-in extension that disables qemu-arm in binfmt_misc for the duration
of an armhf build on an aarch64 host. The kernel's binfmt_elf path then
runs 32-bit ARM ELF binaries natively via CONFIG_COMPAT, removing the
~10× qemu-user-static overhead from mmdebstrap, chroot apt-get, dpkg
--configure, and customize_image stages.
Activate with ENABLE_EXTENSIONS=native-armhf-on-arm64.
Scope:
- host_pre_docker_launch hook: self-gates on ARCH=armhf and aarch64
host arch; refuses to run on hosts where qemu-arm is unregistered;
probes arch-test armhf POST-disable with automatic rollback if the
kernel cannot run armhf natively (no CONFIG_COMPAT);
add_cleanup_handler on success so build exit restores qemu-arm.
- No core changes. No global-state coordination claims.
- Restrictions documented in the file header: NO CONCURRENT ARMBIAN
BUILDS on the same host while the extension is active.
Why an extension (and not a core feature):
- The acceleration only applies to armhf-on-aarch64; no benefit for
other ARCH/host combos.
- Concurrent-safe coordination in core would require kernel-level
flock dance on /proc/sys/fs/binfmt_misc/qemu-arm (see PR armbian#9769),
which is large, hard to review and to test; extension scope makes
the operator the source of truth for host serialization and lets
the implementation stay ~80 lines.
- Cleanup is best-effort: a SIGKILL'd build leaves qemu-arm disabled.
Documented in the header with manual restore instructions.
Validated:
- aarch64 Ubuntu Noble kernel 6.8.x (Hetzner CAX), no COMPAT_VDSO
needed in stock builds.
Assisted-by: Claude:claude-opus-4.7
…e hook
Opt-in extension that disables qemu-arm in binfmt_misc for the duration
of an armhf build on an aarch64 host. The kernel's binfmt_elf path then
runs 32-bit ARM ELF binaries natively via CONFIG_COMPAT, removing the
~10× qemu-user-static overhead from mmdebstrap, chroot apt-get, dpkg
--configure, and customize_image stages.
Activate with ENABLE_EXTENSIONS=native-armhf-on-arm64.
Core change (lib/functions/host/prepare-host.sh):
Add a new extension hook `host_binfmt_ready`, fired from
`prepare_host_noninteractive` immediately after
`prepare_host_binfmt_qemu`. At this point the binfmt_misc registrations
for the target architecture are in place, which is the right window
for any extension that wants to mutate them. Five lines, no behavioral
change for builds that don't use the hook.
Extension (extensions/native-armhf-on-arm64.sh):
- Gate: ARCH=armhf and host arch aarch64 — silent no-op otherwise.
- Verify qemu-arm is registered and enabled (it always is after
prepare_host_binfmt_qemu — otherwise abort with a clear message).
- Disable qemu-arm via /proc/sys/fs/binfmt_misc/qemu-arm.
- Probe-with-rollback: exec /usr/arm-linux-gnueabihf/lib/ld-linux-armhf.so.3
--help to verify the kernel can still run armhf natively via
CONFIG_COMPAT. If it can't (no COMPAT), re-enable qemu-arm and
abort with a clear message. We use a direct ld-linux exec rather
than `arch-test armhf` because the latter is unreliable on some
aarch64 hosts (e.g. Hetzner Ampere CAX, Ubuntu Noble 6.8) — it
returns failure even when CONFIG_COMPAT is fully functional.
- add_cleanup_handler on success: re-enable qemu-arm on build exit
(best-effort; if the build is SIGKILL'd, qemu-arm stays disabled;
documented manual restore in the header).
Restrictions (header):
- NO CONCURRENT ARMBIAN BUILDS on the same host while this extension
is active. The disable mutates global kernel state. The operator
owns the host while this extension runs — no flock dance, no
concurrency claims. Matches the safety level of baseline armbian
(which also doesn't synchronize binfmt_misc mutations across
concurrent builds).
- The in-core flock-based variant at PR armbian#9769 covers concurrent-safe
use; this extension is the opt-in lightweight alternative.
Validated on Hetzner CAX21 (Ampere Altra, Ubuntu Noble 6.8.0-90,
CONFIG_COMPAT=y, no COMPAT_VDSO): ld-linux-armhf exec succeeds with
qemu-arm disabled.
Assisted-by: Claude:claude-opus-4.7
…e hook
Opt-in extension that disables qemu-arm in binfmt_misc for the duration
of an armhf build on an aarch64 host. The kernel's binfmt_elf path then
runs 32-bit ARM ELF binaries natively via CONFIG_COMPAT, removing the
~10× qemu-user-static overhead from mmdebstrap, chroot apt-get, dpkg
--configure, and customize_image stages.
Activate with ENABLE_EXTENSIONS=native-armhf-on-arm64.
Core change (lib/functions/host/prepare-host.sh):
Add a new extension hook `host_binfmt_ready`, fired from
`prepare_host_noninteractive` immediately after
`prepare_host_binfmt_qemu`. At this point the binfmt_misc registrations
for the target architecture are in place, which is the right window
for any extension that wants to mutate them. Five lines, no behavioral
change for builds that don't use the hook.
Extension (extensions/native-armhf-on-arm64.sh):
- Gate: ARCH=armhf and host arch aarch64 — silent no-op otherwise.
- Verify qemu-arm is registered and enabled (it always is after
prepare_host_binfmt_qemu — otherwise abort with a clear message).
- Disable qemu-arm via /proc/sys/fs/binfmt_misc/qemu-arm.
- Probe-with-rollback: exec /usr/arm-linux-gnueabihf/lib/ld-linux-armhf.so.3
--help to verify the kernel can still run armhf natively via
CONFIG_COMPAT. If it can't (no COMPAT), re-enable qemu-arm and
abort with a clear message. We use a direct ld-linux exec rather
than `arch-test armhf` because the latter is unreliable on some
aarch64 hosts (e.g. Hetzner Ampere CAX, Ubuntu Noble 6.8) — it
returns failure even when CONFIG_COMPAT is fully functional.
- add_cleanup_handler on success: re-enable qemu-arm on build exit
(best-effort; if the build is SIGKILL'd, qemu-arm stays disabled;
documented manual restore in the header).
Restrictions (header):
- NO CONCURRENT ARMBIAN BUILDS on the same host while this extension
is active. The disable mutates global kernel state. The operator
owns the host while this extension runs — no flock dance, no
concurrency claims. Matches the safety level of baseline armbian
(which also doesn't synchronize binfmt_misc mutations across
concurrent builds).
- The in-core flock-based variant at PR armbian#9769 covers concurrent-safe
use; this extension is the opt-in lightweight alternative.
Validated on Hetzner CAX21 (Ampere Altra, Ubuntu Noble 6.8.0-90,
CONFIG_COMPAT=y, no COMPAT_VDSO): ld-linux-armhf exec succeeds with
qemu-arm disabled.
Assisted-by: Claude:claude-opus-4.7
Summary
Continues #9284 (
arm64-compat-vdsoextension +custom_kernel_make_paramshook).That PR enabled
COMPAT_VDSOin arm64 kernels; this PR uses the resulting32-bit-capable kernels to run 32-bit ARM ELF directly instead of through
qemu-armemulation in chroot work — typically ~12× faster thanqemu-arm-staticon Cortex-A53/A55/A72/A73/A76.
How
Disable
qemu-armin/proc/sys/fs/binfmt_misc/for the duration of thebuild; kernel
binfmt_elfthen handles 32-bit ARM ELFs natively viaCONFIG_COMPAT. Activation is delayed until AFTERmmdebstrap, whosecross-arch path needs a working
qemu-armregistration to populate thechroot before
libc/ld-linux-armhf.so.3exist.Killswitch:
NATIVE_ARMHF_ON_ARM64=no(synonymsnever,disabled).Concurrency — kernel
flockon the binfmt entryConcurrent armbian builds on the same host kernel are coordinated purely
via kernel BSD
flock(2)on/proc/sys/fs/binfmt_misc/qemu-armitself.No userspace state, no per-builder lockfiles, no
/run/lockdirectory.LOCK_SHon the binfmt entry via a long-lived fd.Kernel BSD-flock counter is the refcount; kernel releases the fd on
process exit (crash-safe).
echo 0to disable; subsequent arrivalsobserve 0 and proceed without writing.
LOCK_SH; last-out detects viaLOCK_EX-LOCK_NBona fresh fd, succeeds iff zero other
LOCK_SHholders. Last-outre-enables
qemu-arm.Trade-offs
qemu-armstate is not recorded across builds. Last-outunconditionally re-enables. An admin's pre-existing
disabledpolicy is not preserved across the build window.
qemu-armmid-build.Don't, or use
NATIVE_ARMHF_ON_ARM64=noon builds that should notinterfere.
inheriting our SH-fd shares the same lock entry. We rely on
add_cleanup_handlerrunning registrations in order and the umountteardown handlers (registered first) killing the docker container
before our restore handler runs. Documented in the trap handler's
docstring; if a future refactor breaks ordering, POSIX
F_SETLKorexplicit descendant-kill are the documented escape hatches.
The earlier revision of this work coordinated via a userspace
owner-flock directory (~250 lines, depended on
/run/lockbeinghost-bind-mounted into build containers — which Armbian docker does
not do). Replaced with the kernel-flock approach above; available in
PR history if needed.
Empirical
./compile.sh build BOARD=helios4 BRANCH=edge BUILD_MINIMAL=yes RELEASE=nobleon droid (RK3328) and m1 (RK3568, Cortex-A55):
mmdebstrapitself (~10 min of every cache-miss build) cannot beaccelerated — speedup applies to subsequent chroot operations.
Test plan
Phase 1 — feature on iron (single builder):
qemu-armon exitDOCKER_PRIVILEGED=no(CAP_SYS_ADMIN unconditional)NATIVE_ARMHF_ON_ARM64=no/never/disabled) — silent returnPREFER_DOCKER=nosmokePhase 2 — concurrency, all on Hetzner CAX21 (kernel 6.8.0-90, binfmt_misc fs):
/proc/sys/fs/binfmt_misc/qemu-armwork as expectedSH-counter holds qemu-arm disabled, first-finished suppresses restore,
last-finished restores
NATIVE_ARMHF_ON_ARM64=noopt-out with concurrent native builder —refused-fast guard rejects (avoids clobbering)
qemu-arm restored, no stale state
Summary by CodeRabbit
New Features
Bug Fixes