dstack version
0.20.19
Python version
3.13.7
Host OS
Linux 5.15.0-135-generic (Ubuntu)
Host Arch
x86_64
What happened?
dstack apply -f <run config> aborts during repo-diff packaging when the working
tree contains a modified tracked file whose content has any sequence the UTF-8
decoder rejects (e.g., a stray Latin-1 byte, BOM artifact, or otherwise
malformed UTF-8). In my case the offending file was a LaTeX paper
(*.tex) unrelated to the run.
Workaround: git stash the offending file before dstack apply, then
git stash pop afterwards. This obviously doesn't scale if the file has to be
in the diff.
Steps to reproduce
- In a git repo, modify a tracked text file so it contains at least one
non-UTF-8 byte sequence (e.g., a Latin-1-encoded character not valid in
UTF-8, or a malformed multi-byte sequence).
- Without committing, run
dstack apply -f <any run config> from that repo.
- The CLI aborts with the trace below before any plan is shown.
Relevant log output
File ".../dstack/_internal/cli/services/configurators/run.py", line 567, in get_repo
repo = get_repo_from_dir(local_path)
File ".../dstack/_internal/core/models/repos/remote.py", line 372, in _repo_diff_verbose
_interactive_git_proc(repo.git.diff(repo_hash, as_process=True), collector)
File ".../dstack/_internal/core/models/repos/remote.py", line 363, in _interactive_git_proc
collector.write(stdout)
File ".../dstack/_internal/core/models/repos/remote.py", line 259, in write
self.buffer.write(v.decode())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 28317: invalid continuation byte
Root cause
_DiffCollector.write at remote.py:259 calls v.decode() (i.e.,
v.decode(\"utf-8\") with default strict errors) on the raw bytes returned by
git diff. Any non-UTF-8 byte sequence in a tracked file makes this raise,
which propagates out of _repo_diff_verbose and kills dstack apply before
the user can see the plan.
Suggested fix
In dstack/_internal/core/models/repos/remote.py, change:
```python
def write(self, v: bytes):
self.buffer.write(v.decode())
```
to either:
```python
def write(self, v: bytes):
self.buffer.write(v.decode("utf-8", errors="replace"))
```
(simplest) or use `codecs.getincrementaldecoder("utf-8")(errors="replace")` if
chunked decoding across multiple `write` calls is expected. Same fix likely
wanted in the `untracked_files` loop in `_repo_diff_verbose` for parity with
the spirit of #390.
Related
#390 (closed) — same exception class but for untracked binary files. The fix
there handled the untracked-binary path; tracked-text-with-non-UTF-8 still
falls through to the strict decoder.
dstack version
0.20.19
Python version
3.13.7
Host OS
Linux 5.15.0-135-generic (Ubuntu)
Host Arch
x86_64
What happened?
dstack apply -f <run config>aborts during repo-diff packaging when the workingtree contains a modified tracked file whose content has any sequence the UTF-8
decoder rejects (e.g., a stray Latin-1 byte, BOM artifact, or otherwise
malformed UTF-8). In my case the offending file was a LaTeX paper
(
*.tex) unrelated to the run.Workaround:
git stashthe offending file beforedstack apply, thengit stash popafterwards. This obviously doesn't scale if the file has to bein the diff.
Steps to reproduce
non-UTF-8 byte sequence (e.g., a Latin-1-encoded character not valid in
UTF-8, or a malformed multi-byte sequence).
dstack apply -f <any run config>from that repo.Relevant log output
Root cause
_DiffCollector.writeatremote.py:259callsv.decode()(i.e.,v.decode(\"utf-8\")with default strict errors) on the raw bytes returned bygit diff. Any non-UTF-8 byte sequence in a tracked file makes this raise,which propagates out of
_repo_diff_verboseand killsdstack applybeforethe user can see the plan.
Suggested fix
In
dstack/_internal/core/models/repos/remote.py, change:```python
def write(self, v: bytes):
self.buffer.write(v.decode())
```
to either:
```python
def write(self, v: bytes):
self.buffer.write(v.decode("utf-8", errors="replace"))
```
(simplest) or use `codecs.getincrementaldecoder("utf-8")(errors="replace")` if
chunked decoding across multiple `write` calls is expected. Same fix likely
wanted in the `untracked_files` loop in `_repo_diff_verbose` for parity with
the spirit of #390.
Related
#390 (closed) — same exception class but for untracked binary files. The fix
there handled the untracked-binary path; tracked-text-with-non-UTF-8 still
falls through to the strict decoder.