Run more submodule tests on Cygwin (fix flaky xfails)#2154
Conversation
Many tests rely on git accepting their fixture directories: the GitPython repository itself, and the gitdb and smmap submodules used by the test suite as direct and nested submodule fixtures. When git rejects one for "dubious ownership" (typically because a CI workflow's `safe.directory` list is missing an entry), three submodule-related tests fail in opaque ways. The exact failure type depends on which side of a race wins in the flush to the cached `git cat-file --batch-check` subprocess: - When Python wins, `ValueError` reaches the test directly. - When git wins, `BrokenPipeError` is caught internally as `IOError` and `iter_items` returns early. Each test then fails on the empty-or-short result in its own way: - `test_docs::Tutorials::test_submodules` indexes the empty `sm.children()` list with `[0]` and raises `IndexError`. - `test_repo::TestRepo::test_submodules` and `test_submodule::TestSubmodule::test_root_module` see a length-1 (not length-2) submodule list from their recursive traversals and fail their length assertion with `AssertionError`. This commit adds `test/test_fixture_health.py`, which runs `git rev-parse --show-toplevel` in each fixture directory and asserts both that git is willing to operate there and that it reports the directory as its own toplevel. The failure message names `safe.directory` and ownership as the likely causes, so a misconfigured environment is recognizable directly from the test output rather than needing to be pieced together from a cascade of failing tests not conceptually related to safe directory protections. This adds the test only. Further replication and a fix are forthcoming. On the Cygwin CI workflow now, the test fails for `[gitdb]` and passes for `[repo_root]` and `[smmap]`. The asymmetry between the two submodules relates to NTFS ownership inherited from `actions/checkout`'s clone of the main tree. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This adds the new test, `test_required_submodule_is_initialized`, to `test_fixture_health.py`. For the gitdb and smmap submodules, the test asserts that the working tree directory exists and contains a `.git` marker (file or directory). The failure message reminds the developer to run `git submodule update --init --recursive`. This is a regression test for a different contract than the preexisting `test_fixture_dir_is_trusted_by_git` in `test_fixture_health.py`. That test verifies git's willingness to operate in each fixture directory (the `safe.directory` / dubious-ownership contract). This one verifies that the directories are populated at all. When the gitdb and smmap submodules aren't populated, the rest of the suite fails in ways that don't name the cause: a cascade of failures across `test_docs.py`, `test_repo.py`, and `test_submodule.py` -- tests that need submodule content but don't set out to check for it. That cascade was the failure mode of gitpython-developers#1713 on the Arch Linux build, where `init-tests-after-clone.sh` had stopped running `git submodule update` on CI. Wherever the submodules are actually missing, this test reports the missing path directly instead. The test skips when the source tree is not a git clone (no `.git` at `REPO_ROOT`). This is to accommodate setups that run pytest from an extracted release tarball and prepare submodules in a separately-pointed tree via `GIT_PYTHON_TEST_GIT_REPO_BASE` (e.g. OpenIndiana's package build). In such setups, `git submodule update` cannot operate on the source tree, and the submodule paths checked here would never be populated there, but the test suite still works because `rorepo` points elsewhere. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the Cygwin xfail decorations from `test_submodules` in `test_docs.py` and `test_repo.py`, and from `test_root_module` in `test_submodule.py`, so the tests surface the underlying failure directly. Also temporarily add 256 `reproduce-safe-dir` matrix jobs to `.github/workflows/cygwin-test.yml`, each running these three tests under the current `safe.directory` configuration. All or most of these `reproduce-safe-dir` jobs shall be removed before the associated bugfix is integrated. The existing test job's `env`, `defaults`, and setup steps gain YAML anchors so the new job can reference them without duplication. The anchors will be removed when the `reproduce-safe-dir` jobs are. Root cause ---------- The CI workflow adds the main repo and its `.git` directory to `safe.directory` but not the gitdb or smmap submodule working trees. Git matches `safe.directory` by specific paths (or wildcards, but we are not using wildcards). When GitPython opens a submodule as a `Repo` and runs `git cat-file --batch-check` against it, git rejects the repository for dubious ownership and exits. The race -------- GitPython caches the cat-file process. When git has exited, the next call to `__get_object_header` hits one of two outcomes depending on whether Python's `cmd.stdin.flush()` runs before or after the kernel marks the read end of the pipe as closed. (This is the same mechanism as the long-standing gitpython-developers#427, in which SIGINT kills the cat-file process and the next `flush()` raises `BrokenPipeError`.) 1. If Python wins the race, `flush()` succeeds (data goes into the kernel buffer); the subsequent `stdout.readline()` returns `b""` (EOF); `_parse_object_header` raises a `ValueError` whose message names the rejected directory. 2. If git wins, `flush()` raises `BrokenPipeError`, a subclass of `OSError` (a.k.a. `IOError`). In both paths, the exception travels from `Object.new_from_sha` (outside `name_to_object`'s `except ValueError`, which covers only the `dereference_recursive` call), up through `repo.commit("HEAD")`, into `iter_items` in `git/objects/submodule/base.py`. That function's `except (IOError, BadName)` clause catches `BrokenPipeError` but not `ValueError`. So path (1) propagates `ValueError` all the way to the test, while path (2) ends with `iter_items` returning early and the test seeing an empty submodule list. Per-test outcomes ----------------- What the test sees in path (2) is determined by what the test does with the empty list: - `test_docs::test_submodules` does `sm.children()[0].name`, so the `[0]` on the empty list raises `IndexError`. - `test_repo::test_submodules` does `assertGreaterEqual(len(list(self.rorepo.iter_submodules())), 2)`. The recursive traversal yields gitdb (its `iter_items` on the *main* repo succeeds, because the main repo IS in `safe.directory`) but not smmap, so length 1 fails the assertion. - `test_submodule::test_root_module` does `assert len(rsmsp) >= 2` on a similar traversal result, failing. The race itself is non-deterministic, but the mapping from race outcome to exception type is deterministic per test. So each test has exactly two possible failure types, one per side of the race: - `test_docs`: `ValueError` -> Python wins, OR `IndexError` -> git wins. - `test_repo`: `ValueError` -> Python wins, OR `AssertionError` -> git wins. - `test_submodule`: `ValueError` -> Python wins, OR `AssertionError` -> git wins. In particular, `test_docs` never produces `AssertionError`, and `test_repo` and `test_submodule` never produce `IndexError`. Empirical confirmation ---------------------- In 1024 `reproduce-safe-dir` jobs and 4 buggy-config "test (fast)" job runs, 100% of 3084 target-test failures match the per-test prediction. `ValueError` accounts for ~99.0%, while the race-win exception types from the list above account for the others. `reproduce-safe-dir` runs: https://github.com/EliahKagan/GitPython/actions/runs/25836741324 https://github.com/EliahKagan/GitPython/actions/runs/25836329241 https://github.com/EliahKagan/GitPython/actions/runs/25836334196 https://github.com/EliahKagan/GitPython/actions/runs/25836339345 Run on the fix commit (256 jobs, 768 test outcomes, all PASSED): https://github.com/EliahKagan/GitPython/actions/runs/25836344886 The race condition, from a "test (fast)" job in a PR gitpython-developers#2143 CI run: https://github.com/gitpython-developers/GitPython/actions/runs/25440735020/attempts/1 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Temporarily add a `diag-token` job to the `cygwin-test` workflow that creates a directory through four code paths and reports its NTFS Owner. The idea is to empirically establish the cause of the Owner asymmetry: a Cygwin or Cygwin-derived runtime (`cygwin1.dll` for Cygwin git; `msys-2.0.dll` loaded by Git for Windows's bundled MSYS2 `sh.exe`) rewrites the process token's `TokenOwner` field at DLL initialization, and `CreateProcessW` propagates that mutation to every descendant. The four tests are: | Test | Sequence | Predicted Owner | | ---- | ----------------------------------------------- | ------------------------ | | A | PowerShell `New-Item` | `BUILTIN\Administrators` | | B | Cygwin `mkdir` | `runneradmin` (197108) | | C | Cygwin bash -> Git for Windows `git init` | `runneradmin` (197108) | | D | PowerShell -> Git Bash -> PowerShell `New-Item` | `runneradmin` (197108) | Test A is the Win32 baseline. The kernel-default `TokenOwner` for `runneradmin`'s full administrative token is `BUILTIN\Administrators`, because `runneradmin` is the built-in local Administrator (RID 500) and `FilterAdministratorToken=0` exempts it from UAC token filtering. Test B is the Cygwin baseline. Cygwin's `cygheap_user::init` calls `NtSetInformationToken(hProcToken, TokenOwner, &effec_cygsid, ...)` at DLL init, and the resulting Owner reflects the rewritten token. Test C is the load-bearing case. The child is a native Windows program that loads no Cygwin runtime of its own. It produces a user-owned `.git`, showing that the rewrite performed by the parent Cygwin bash propagates through `CreateProcessW`'s token duplication to a native Windows descendant. Test D strengthens the case: the first and last links are native Windows programs (a fresh PowerShell using its built-in `New-Item` cmdlet). Only the middle link is a Cygwin-family runtime (Git for Windows's MSYS2 `bash.exe`, linked against `msys-2.0.dll`). The directory is still user-owned, showing that the mechanism does not depend on git, nor on shell dispatch via `git submodule`, nor on any property of the final-link binary, but only on whether *any* process in the ancestry has loaded a Cygwin-family runtime. This `diag-token` job, like the `reproduce-safe-dir` matrix, is temporary and should be removed before this work is integrated. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In each workflow that runs the test suite, add a step to the `test` job that prints the ownership (NTFS Owner / POSIX uid+gid) of key paths: the workspace, its `.git`, gitdb and smmap submodule worktrees and gitfiles, and the runner `HOME` and `~/.gitconfig`. Also print the full list of `safe.directory` entries at that point. GitPython's tests are intentionally coupled to the layout and state of the GitPython repository they run against. The ownership and trust config that gate whether git will operate on a path are part of that state. Surfacing them in every test job gives diagnostics to read alongside any failure that turns out to be a CI setup problem. The step is added to: - `cygwin-test.yml`'s `test` job, with a YAML anchor (`&ownership-display`) so the temporary `reproduce-safe-dir` matrix job can reuse it via `*ownership-display`. - `pythonpackage.yml`'s `test` job (Linux / macOS / native Windows). - `alpine-test.yml`'s `test` job. It runs after `init-tests-after-clone.sh` has populated the submodules and after `safe.directory` has been configured (in workflows that configure it), so the values reported are what the tests will see. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cleanups on the ownership-display step from the previous commit:
- Split into per-shell steps (bash POSIX `ls -ld`, pwsh NTFS
`Get-Acl`, bash safe.directory `git config`), removing the
bash-driven PowerShell subprocess with `cygpath -w` and
quote-escaping. Use `pwsh`, not `powershell.exe`. Gate
pythonpackage's two views by `matrix.os-type` (Git Bash's `ls -ld`
on Windows reports a uniform `runneradmin 197121` for every path,
ignoring NTFS Owner -- MSYS2's SID-to-uid mapping doesn't have
Cygwin's fidelity).
- Trim the decorative `$HOME` entry from POSIX path lists: it isn't
part of any git trust decision -- only `~/.gitconfig` is. Use
`${HOME:?HOME is not set}/.gitconfig` for the remaining entry so an
unset HOME fails loudly.
- Move the pwsh path list into a `$paths = @(...)` variable. Unix
shells keep the inline `for p in WORD WORD ...` form: alpine's
`sh` (busybox ash) doesn't support arrays, and the others
shouldn't differ from it unnecessarily.
- Drop `2>&1` from the safe.directory step. Drop the `|| echo
"(none)"` fallback on cygwin (entries are explicitly configured;
bare command fails on regression). Keep it on pythonpackage and
alpine, where `actions/checkout`'s safe.directory add lives under
a throwaway HOME override and doesn't persist, so there
legitimately are no entries to display.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `submodules: recursive` to the `actions/checkout` step in every workflow that runs the test suite (`cygwin-test.yml`, `pythonpackage.yml`, `alpine-test.yml`). This is *temporary* and will be reverted after the bugfix, with the explicit intent of demonstrating that the bugfix works regardless of which mechanism populates the submodules. The standing decision is to NOT use `submodules: recursive` in CI. `init-tests-after-clone.sh` is the documented setup mechanism that downstream packagers (Arch Linux and others) rely on, and keeping it as the sole submodule source on upstream CI is meant to catch regressions like gitpython-developers#1713 before they reach distros. See gitpython-developers#1715 for the full rationale. The CI run on this commit is expected to show: - Cygwin (`test-cygwin`): the `safe.directory` bug still triggers, with the same `ValueError`/`IndexError`/`AssertionError` pattern as the previous commits. The bug is independent of which process clones the submodules; the gitdb worktree directory itself is created Admin-owned by the outer `git clone`'s checkout phase before any submodule init runs, regardless of which mechanism populates the submodule contents afterward. - Linux / macOS / native Windows (`Python package`) and Alpine Linux (`test-alpine`): tests pass as before. These platforms are unaffected by the bug. Each workflow's checkout step carries an inline comment pointing at gitpython-developers#1715 so the temporary nature of the change is legible at a glance. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the gitdb and smmap submodule working tree paths to the `safe.directory` config in the Cygwin CI workflow. Without this, when GitPython opens a submodule as a `Repo` and runs `git cat-file --batch-check` against it, git rejects the repository for dubious ownership. The user-visible failure modes (`ValueError`, `IndexError`, `AssertionError`) all trace back to this rejection. Why `[gitdb]` failed and `[smmap]` passed ----------------------------------------- The trust check in `test_fixture_health.py` failed for `[gitdb]` but passed for `[smmap]`, even though neither submodule's working tree was in the workflow's `safe.directory` list before this commit. The asymmetry comes down to which Owner SID NTFS records on each path, and which paths git's ownership check requires to be owned. There are six paths git's check might consider for the two submodules: gitdb's `.git` gitfile, worktree `git/ext/gitdb`, and gitdir `<repo>/.git/modules/gitdb`; smmap's `.git` gitfile, worktree `git/ext/gitdb/gitdb/ext/smmap`, and gitdir `<repo>/.git/modules/gitdb/modules/smmap`. Of these, only one had NTFS Owner = `BUILTIN\Administrators` (Cygwin uid 544): gitdb's worktree. The other five had Owner = `runneradmin`, whose Cygwin uid (197108) was the value `geteuid()` returned in the test process. The same Owner pattern held both with and without `submodules: recursive` in `actions/checkout`. The single `Administrators`-owned path was gitdb's worktree. All other paths were `runneradmin`-owned, including the ones that Git for Windows's recursive submodule clone had just produced when `submodules: recursive` was set. The differentiator is not which git binary clones the submodules, but that `git/ext/gitdb` is created by the outer `git clone`'s checkout phase. When `git checkout` materializes a tree entry of mode 160000, it calls `mkdir(path, 0777)` to create an empty submodule directory (see `entry.c::write_entry`, case `S_IFGITLINK` [1] [2]). On Windows GHA runners, jobs run as `runneradmin`. This is the built-in local Administrator account (its Cygwin uid 197108 = 196608 + 500 matches Cygwin's mapping [3] for machine-local accounts: 0x30000 plus the SID suffix 500, the well-known suffix of that account's SID). That account is exempt from UAC token filtering by default (Admin Approval Mode for the built-in Administrator account is disabled [4]), so its processes hold the full administrative access token. `CreateProcessW` propagates the parent's primary token unchanged through `actions/checkout`'s process tree. Inside that tree, the outer `git clone`'s `mkdir(path, 0777)` produces directories whose NTFS Owner is `BUILTIN\Administrators` -- as observed on every workspace directory the outer clone materialized, including the `git/ext/gitdb` placeholder. Subsequent submodule-update operations -- both Git for Windows if `actions/checkout` does a recursive clone, and Cygwin git if it happens later due to `init-tests-after-clone.sh` -- produce paths that NTFS records as `runneradmin`-owned. Both flows go through a process whose primary token's `TokenOwner` field has been rewritten from `BUILTIN\Administrators` to the user SID by a Cygwin or Cygwin-derived runtime at DLL initialization. The rewrite propagates to every descendant via `CreateProcessW`'s primary-token inheritance [5], so every `mkdir` issued after that point produces a directory owned by the user. - Cygwin git triggers the rewrite directly. `cygheap_user::init` in `cygwin1.dll` calls `NtSetInformationToken(hProcToken, TokenOwner, &effec_cygsid, ...)` at process startup [6]. - Git for Windows triggers it indirectly. `git submodule` is not a builtin (only `submodule--helper` is) [7]. So it falls through to `execv_dashed_external` and runs `git-submodule.sh`, a shell script whose shebang is resolved at runtime to `sh.exe` in the Git for Windows "Git Bash" environment. That `sh.exe` is an MSYS2 binary linked against `msys-2.0.dll`, a Cygwin fork that performs the same `TokenOwner` rewrite. From there, every `git.exe` the script spawns inherits the user-SID `TokenOwner` and produces user-owned directories. Cygwin's `lstat().st_uid` reports the actual NTFS Owner SID mapped through Cygwin's SID-to-uid table. `is_path_owned_by_current_user` reduces to `lstat(p).st_uid == geteuid()` on Cygwin (no Administrators group exemption). `ensure_valid_ownership` returns 1 (accepting the repository) without consulting `safe.directory` when the gitfile, worktree, and gitdir ALL pass that owned-by-current-user check. Otherwise it falls through to comparing the worktree's `real_pathdup` against each configured `safe.directory` entry. For gitdb the three Owners were `runneradmin` (gitfile), **`Administrators` (worktree)**, and `runneradmin` (gitdir), so the all-paths-owned check failed on the worktree. The workflow's `safe.directory` before this commit contained only `$(pwd)` and `$(pwd)/.git`, neither of which exact-matches `git/ext/gitdb`, so the `safe.directory` comparison also failed, and `ensure_valid_ownership` returned 0 -- git rejected the repository. For smmap the three Owners were all `runneradmin`, so the all-paths-owned check passed. Cygwin's `chown` cannot rewrite the gitdb worktree's Owner SID from `Administrators` to `runneradmin`: it returns "Permission denied". Adding both submodule worktree paths to `safe.directory` is the correct fix and is robust against shifts in what paths inherit which Owner. Why `actions/checkout`'s own `safe.directory` does not help ----------------------------------------------------------- `actions/checkout`'s `set-safe-directory: true` default writes the main repository path to `safe.directory` in a temporary config it points its own spawned git child at by overriding `HOME` for that child process. That `HOME` override applies only to git invocations the action itself spawns; subsequent steps' processes inherit the runner user's real `HOME` (e.g., `C:\Users\runneradmin` on the Windows runner) and read its actual `~/.gitconfig`, which never received the entry. So no git in a later step, whether Git for Windows or Cygwin git, sees it. That's why the `cygwin-test` workflow sets Cygwin git's `safe.directory` itself. This commit extends that to cover the gitdb and smmap working trees. The distinction between Cygwin git and Git for Windows is also why the bug affected the Cygwin jobs and no other Windows jobs. `compat/mingw.c` defines `is_path_owned_by_current_sid` [8], which accepts `BUILTIN\Administrators`-owned paths when the current user is a member of `Administrators`. Cygwin git compiles against the POSIX path (`is_path_owned_by_current_uid` in `git-compat-util.h` [9]) without that leniency. So the same `Administrators`-owned `git/ext/gitdb` that Cygwin git rejects is silently accepted by Git for Windows, and the main CI workflow's `windows-latest` jobs never trip the trust check. Verification ------------ The `reproduce-safe-dir` matrix on the previous commits produces failures for all three affected tests; this commit's CI run shows those tests passing instead. The Owner-SID claim above is verified by the `diag-token` job introduced for this purpose. That job creates a directory through four code paths (PowerShell-only, Cygwin-only, Cygwin-bash spawning Win32 `git init`, and a PowerShell -> Cygwin-bash -> PowerShell sandwich) and reports the NTFS Owner of each. The observed Owners match the predicted values in every case, including the load-bearing Cygwin -> Win32 propagation case (test C) and the sandwich case (test D) showing that the determinant is whether some process in the ancestry has loaded a Cygwin-family runtime, not the identity of the file-creating binary. The commit immediately preceding this one temporarily sets `submodules: recursive` on `actions/checkout` in every workflow that runs the test suite. Its CI run shows the bug still triggering on Cygwin (the gitdb worktree directory itself is created by the outer `git clone`'s checkout phase, before any submodule init runs, regardless of which mechanism subsequently populates the submodule contents). A subsequent commit will revert that change; its CI run shows this fix continues to hold without `submodules: recursive`, confirming the fix is independent of submodule source. [1]: https://github.com/git/git/blob/v2.51.0/entry.c#L397 [2]: https://github.com/git-for-windows/git/blob/v2.54.0.windows.1/entry.c#L397 [3]: https://cygwin.com/cygwin-ug-net/ntsec.html [4]: https://learn.microsoft.com/en-us/windows/security/application-security/application-control/user-account-control/settings-and-configuration [5]: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw [6]: https://sourceware.org/cgit/newlib-cygwin/tree/winsup/cygwin/uinfo.cc?h=cygwin-3.6.9#n82 [7]: https://github.com/git-for-windows/git/blob/v2.54.0.windows.1/git.c#L661 [8]: https://github.com/git-for-windows/git/blob/v2.54.0.windows.1/compat/mingw.c#L3931 [9]: https://github.com/git/git/blob/v2.51.0/git-compat-util.h#L346 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revert the temporary addition of `submodules: recursive` to `actions/checkout` in `cygwin-test.yml`, `pythonpackage.yml`, and `alpine-test.yml`. The `safe.directory` fix has been verified to work regardless of which mechanism populates the submodules. Returning the workflows to their pre-test posture restores the standing arrangement: `init-tests-after-clone.sh` as the sole submodule source on upstream CI. See gitpython-developers#1715 for the rationale. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove three things from the `cygwin-test` workflow that were added to demonstrate the `safe.directory` bug and its fix: - The `reproduce-safe-dir` matrix (256 jobs running three submodule tests). Added to give a high-confidence reproduction of the failure pattern across runner-instance variation. - The `diag-token` job. Added to empirically establish the TokenOwner rewrite mechanism behind the gitdb-worktree Owner asymmetry. - The YAML anchors that only those temporary jobs needed (`&force-lf`, `&checkout`, `&install-cygwin`, `&verbose-output`, `&safe-directory`, `&prepare-repo`, `&git-identity`, `&setup-venv`, `&update-pypa`, `&install-deps`, `&ownership-posix-display`, `&ownership-ntfs-display`, `&safe-directory-display`, `&cygwin-env`, `&cygwin-defaults`). The `test` job still has all those steps; it just no longer needs to anchor them for reuse. What stays: the `test` job (the actual Cygwin test suite), the fixture-health and required-submodule checks, and the always-on file-ownership / `safe.directory` diagnostic steps (kept across all test workflows). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gitdb's `async` submodule was removed back in 2014 (gitpython-developers/gitdb@bf942a9); only smmap remains. The leading "gitdb / async" comment and the `assert len(rsmsp) >= 2` check (loosened back in 2011 from `== 2` in 4a8bdce when smmap was added to gitdb alongside async) are both stale. Replace with an exact list-equality check on the expected paths in traversal order. That order is also what later code in this function assumes via positional indexing `rsmsp[0]`, `rsmsp[1]`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR removes three xfail markers on Cygwin submodule tests by addressing the real underlying cause: Cygwin git refuses to operate on the git/ext/gitdb submodule directory because actions/checkout (Git for Windows) creates it owned by the Administrators group, which Cygwin git—unlike Git for Windows—does not treat as user-owned. The fix adds the missing submodule paths to the Cygwin workflow's safe.directory configuration. The PR also adds a new test_fixture_health.py to surface this class of misconfiguration with clear messages, and adds diagnostic CI steps that print POSIX/NTFS ownership and safe.directory entries to make future debugging easier.
Changes:
- Add
safe.directoryentries forgit/ext/gitdband the nested smmap submodule in the Cygwin workflow; remove now-unneededxfailmarkers on three Cygwin submodule tests and tighten one assertion. - Add
test/test_fixture_health.pyto fail fast with actionable messages when fixture directories are missing, uninitialized, or rejected by git for dubious ownership. - Add ownership /
safe.directorydiagnostic steps to the standard, Alpine, and Cygwin workflows.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/cygwin-test.yml | Adds safe.directory entries for the gitdb/smmap submodules (the actual fix) and diagnostic steps showing POSIX/NTFS ownership and configured safe.directory entries. |
| .github/workflows/pythonpackage.yml | Adds POSIX/NTFS ownership diagnostics and a safe.directory listing step on the main CI workflow. |
| .github/workflows/alpine-test.yml | Adds POSIX ownership and safe.directory listing diagnostics for the Alpine job. |
| test/test_submodule.py | Removes the Cygwin xfail from test_root_module and tightens the deep-traversal submodule assertion to an exact path list. |
| test/test_repo.py | Removes the Cygwin xfail from TestRepo.test_submodules. |
| test/test_docs.py | Removes the Cygwin xfail from test_submodules and drops now-unused sys/pytest imports. |
| test/test_fixture_health.py | New module verifying that required fixture directories exist, are initialized as git repos, and are trusted by git, providing clear remediation hints on failure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
EliahKagan
left a comment
There was a problem hiding this comment.
I think this will be ready to merge once CI passes.
In case CI on intermediate commits is of interest, this can be seen in my fork. I've refrained from doing multiple pushes here because some of the intermediate commits temporarily introduce many test jobs to ensure I produce the rare ~1% race conditions and that the fix actually makes them go away.
There was a problem hiding this comment.
One shortcoming is that, due to the effect of set -x, the new steps that show POSIX file ownership for debugging purposes produce output that is ugly and hard to read. The solution might just be to do set +x for those steps, though I worry that in realistic situations where something goes wrong and we want to see that, it might cause something to be left out that we actually do want. I'll think about that and hope to include a cosmetic improvement in a future PR. My view is that it shouldn't be considered a blocker.
In #1455, which got Cygwin tests running on GitHub Actions, DWesl mentioned:
Byron's review mentioned:
Further discussion raised the question of whether this was the case, as well as workarounds for using submodules on Cygwin even if submodule-specific functionality is broken. The tests were marked
xfail(54bae76, 0eda0a5, 7f3689d) and the PR was merged.It turns out the GitPython submodule functionality is not broken on Cygwin, and neither are the tests! Instead, the problem is that the
rorepofixture is GitPython's own repository, and whenactions/checkoutclones GitPython, the top-levelgitdbdirectory is owned by theAdministratorsrather than therunneradminuser. Git for Windows special-cases this, judging that repositories owned by theAdministratorsgroup are safe for members of that group to trust. Cygwin git does not special-case this, so moresafe.directoryentries are needed to assuage it.The key to identifying the cause of the problem is that only tests that use
self.rorepoand attempt submodule operations, but not those using@with_rw_repoand attempting submodule operations, failed in this way. The nature of the problem was obscured by several oddities, though the fourth and final oddity is also what revealed it:safe.directoryprotections on Windows are unintuitive and complex even in cases where they work fully as intended, only one Git implementation is involved, and CI behavior is likely to match local behavior, none of which hold here.actions/checkoutclones the submodules, or leaves them to be cloned byinit-tests-after-clone.sh, is irrelevant to all aspects of this problem. This is even though the action uses Git for Windows, while the init script (in a Cygwin job) uses Cygwin git. Thegit/ext/gitdbdirectory is created when checking out GitPython itself, and it is owned byAdministrators.git cat-file --batch-checkinvocation we intend to reuse across operations. The git subprocess quits before reading its input. Currently GitPython issues a message about possible dubious ownership in this situation, since that's the most common cause. But it is not the only possible cause of that message from GitPython--so if one does not expect that the error is due tosafe.directoryprotections, then one might dismiss that.git cat-file --batch-checkand send refs through the pipe. But if there is dubious ownership, then the git process quits before reading its input. So there is a race condition: our write may fail withEPIPEand raiseBrokenPipeError, or our read may complete with EOF, such that we passb""to_parse_object_header, which raisesValueError.BrokenPipeErroris a subclass ofIOError, and we currently swallowIOErrorin some of the places where this happens. TheValueErrorcase is overwhelmingly more common. That's thexfaildecorations from #1455 covered. Recently,EPIPEwins the race slightly more often than in the past, and we've had to rerun tests a number of times. It would be possible to add more exception types to thexfaildecorations, but a better approach is to verify the complete details of what is going wrong and why, add more regression tests, and fix the problem properly, removing thexfails. This PR does those things.The fix is very simple: we already have a CI step that adds paths to the Cygwin git
safe.directoryconfiguration, and the fix is to add the missing paths related to submodules. But the tests are somewhat nontrivial, and the partially reverted instrumentation to fully confirm the cause and facilitate easier debugging in the future is also somewhat nontrivial.The code changes and commit messages in this PR were made with Claude Code, as disclosed in commit message trailers. I've reviewed and substantially adjusted the code changes. I've also reviewed and honed the commit messages through many rounds of revision, including manual edits. A few of the commit messages are long and dense. I have made sure to spend more time with those, to ensure I believe the details are warranted, since even though I am well known for writing long detailed commit messages, I understand this is something people are more wary of when LLMs are involved (since they can generate large amounts of text quickly). As for this PR description, no part of it is LLM-generated, though I did use Claude for proofreading.
The commit messages describe the situation, the evidence for it, and the fix from the perspective of tracing what has occurred. One aspect of the bug--the behavior of
safe.directoryprotections on Windows when multiple git implementations are used and the repository has submodules that it must operate on--is particularly non-obvious, unintuitive, and interesting, and it may end up being relevant to future improvements here in GitPython, as well as to submodule portability subtleties that might arise in the future in gitoxide. Hence this section.On POSIX, the owner and group owner of a file are separate concepts, with each file being owned by one user and group-owned by one group. But on Windows there is a unified concept of Owner. Unlike on a Unix-like system, on Windows a user or a group can own a file in the same sense of "own." Users usually own the files they create, but one of the exceptions to this is that users who are members of the
Administratorsgroup create files that are owned by theAdministratorsgroup. This exception is actually more specific than that: it only applies when the user is actually running with their unfiltered (full) token in which theAdministratorsgroup is active. Usually, members of theAdministratorsgroup on Windows run with UAC enabled and configured to require elevation to act with their full administrative powers. But therunneradminuser that runs Windows CI jobs runs with a full admin token, so files it creates are owned byAdministrators.Git uses ownership as a powerful trust signal. It will operate on repositories it thinks the user running it owns, since the configuration and hooks in such a repository are presumably safe. Git checks if some important repository paths, such as the repo's top-level directory and its
.gitdir, have the user running it as their owner. If they do, it trusts the repository. For any not owned, it checks if they match any values of thesafe.directoryconfiguration variable. If any are neither owned by the user nor match entries insafe.directory, then Git refuses to operate. But this gets weird on Windows, where the directories might be owned by something that isn't a user at all:Git for Windows lets members of the
Administratorsgroup operate on repositories whose ownership matches what those users might very well create themselves. If the user is inAdministrators, and a directory is owned byAdministrators, then Git for Windows considers it to be owned by the user for purpose of assessing trust.Cygwin git does not do this. It doesn't generally need to. If you start a program built against
cygwin1.dll--whether that program isgitor anything else--it changes the owner of the process token to the user, and then the owner inherited by securable objects (such as files) the process creates is the user instead of whatever it was before. So a member of theAdministratorsgroup, acting with the full powers thereof, can run Cygwin git withAdministratorsas the process token owner initially--but the process token owner is changed to the user, almost immediately, before Cygwin git'smain()function is called. For this reason, Cygwin git typically has no need to treat repositories owned by theAdministratorsgroup specially--it never creates such a repository.When we clone a repository that has a submodule, we may or may not also clone the submodule. If we do, we might clone it at the same time, or later. But whatever happens, so long as the top-level repository is able to be checked out, we first get the top-level repository with an empty directory at the submodule root. Whatever ownership we are creating files with, we create the submodule root with that ownership.
In all our Windows CI jobs, including the Cygwin jobs, we use
actions/checkoutto clone. Whileactions/checkoutis capable of cloning repositories using the GitHub REST API, it only uses this as a fallback strategy. It first checks if a recent enough version ofgitis available and, if so, uses it. On all our Windows CI jobs, including the Cygwin jobs,actions/checkoutclones the GitPython repository using Git for Windows.Thus, no matter how its submodules are cloned, the top-level directory of
gitdbis owned by theAdministratorsgroup. Cygwin git would therefore refuse to operate on it. But the GitPython test suite uses the GitPython repository itself, as well as its direct submodule gitdb and its nested submodule smmap, as test fixtures. Because Cygwin git wouldn't operate on the gitdb submodule, tests that use it were failing.Most submodule tests didn't have a problem, because they use
@with_rw_repo, which creates a new clone, instead ofself.rorepo, which uses the repository in place. On Cygwin,@with_rw_repoclones the repository with Cygwingit. As described above, when Cygwin git clones repositories, it clones them as the user.More remarkably, while I have added
safe.directoryentries for the nested smmap module, this is actually not strictly necessary--smmap never had dubious ownership! The reason is that only thegitdbdirectory created in the top-level GitPython checkout is owned byAdministrators. No contents of submodules, not even of the gitdb submodule, are checked out as owned byAdministrators. This is the case even ifactions/checkoutis made to clone them all by settingsubmodules: recursive. (I tested with this to be sure. But I did not keep this, since we have good reasons to validate that our init script will clone the submodules even if they were not cloned before. See #1713 and #1715 on this.) That is to say that none of the files created when cloning submodules are owned byAdministratorseven when Git for Windows creates them in the same top-levelgit clonecommand.How can this be? Historically, the machinery that operated on submodules was implemented in scripts. Over time, it basically all came to be implemented in C, except that the
git-submodulesubcommand itself remains implemented bygit-submodule.sh. Today, the major functionality of the script is to parse options, apply defaults, look up some information about the current repository, and callgit submodule--helperto do the real work. In Git for Windows, the C code that does the real work is in native Windows programs such asgit.exe. Butgit-submodule.shis a shell script. Its interpreter issh.exefrom the MSYS2 "Git Bash" environment that Git for Windows ships.MSYS2 is like Cygwin (MSYS2 is a fork of MSYS, which is a fork of Cygwin). Just as Cygwin programs link to
cygwin1.dll, which does various setup--including, as described above, resetting the process's token owner to the user--MSYS2 programs link tomsys-2.0.dll, which does that very same thing. Therefore, a user who is a member of theAdministratorsgroup can run git withAdministratorsas its process token owner, but in any chain of subprocesses that goes through a shell invocation, everything at or below that invocation will operate with it reset to the user.