Skip to content

Allow nativeparse to parse source code directly#21260

Draft
bzoracler wants to merge 14 commits into
python:masterfrom
bzoracler:nativeparse-source
Draft

Allow nativeparse to parse source code directly#21260
bzoracler wants to merge 14 commits into
python:masterfrom
bzoracler:nativeparse-source

Conversation

@bzoracler
Copy link
Copy Markdown
Contributor

This is the mypy counterpart of mypyc/ast_serialize#54

@bzoracler
Copy link
Copy Markdown
Contributor Author

bzoracler commented Apr 17, 2026

Current CI failure is due to changed typing signature of ast_serialize.parse::source, this has been fixed in the corresponding PR in mypyc/ast_serialize (see changed line).

@github-actions

This comment has been minimized.

@bzoracler bzoracler force-pushed the nativeparse-source branch from c8c10dd to ac275e4 Compare April 28, 2026 02:28
@github-actions

This comment has been minimized.

@bzoracler bzoracler force-pushed the nativeparse-source branch from 444f4e9 to 149e459 Compare April 28, 2026 03:07
@bzoracler bzoracler marked this pull request as draft April 28, 2026 03:10
@github-actions

This comment has been minimized.

@bzoracler
Copy link
Copy Markdown
Contributor Author

bzoracler commented Apr 28, 2026

CI failures:

  • Step Compiled with_mypyc: As before, this is fixed in https://github.com/bzoracler/ast_serialize/blob/566ddc362930a821549ca5fbb0d7d0f3bd88eb6e/ast_serialize.pyi#L26
  • These errors should be fixed using the updated binaries built from the changes in Allow parsing source code directly mypyc/ast_serialize#54:
    • E TypeError: argument 'source': 'bytes' object is not an instance of 'str'
    • E ValueError: Source parsing is not supported yet for test_trivial_binary_data_from_string_source
    • E ValueError: Source parsing is not supported yet for testPackageRootMultipleParallel, testParallelRunWithSyntaxError, testCheckingStubPackagesWorksInParallelMode, and job Parallel tests with .*: I believe like the code path for parallel checking causes both the source code and a file name to be passed to parsing functions? I think the tests passed before because either the source argument was not passed or the file_exists check resulted in False (and we fell back to the old parser when the file didn't exist).

Is it possible for CI to run a non-released version of ast_serialize?

@bzoracler bzoracler marked this pull request as ready for review April 28, 2026 04:48
@github-actions

This comment has been minimized.

ilevkivskyi pushed a commit to mypyc/ast_serialize that referenced this pull request May 17, 2026
ilevkivskyi added a commit that referenced this pull request May 17, 2026
Copy link
Copy Markdown
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment for now. Also it looks like parallel type-checking is somehow broken by this.

Comment thread mypy/build.py

Raise CompileError if there is a parse error.
"""
file_exists = self.fscache.exists(path, real_only=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can:

  • Remove other two call sites to fscache.exists() in this file (and update relevant code).
  • Remove real_only parameter and related logic from fscache. IIRC it is only needed for native parser.

@github-actions

This comment has been minimized.

@bzoracler bzoracler marked this pull request as draft May 17, 2026 19:13
@bzoracler
Copy link
Copy Markdown
Contributor Author

bzoracler commented May 17, 2026

@ilevkivskyi I don't quite know what's going on here. I checked out 1100800 (the commit before bumping ast-serialize to 0.5.0 on master), installed ast-serialize==0.4.0, and did this:

if options.native_parser:

-     if options.native_parser:
+     if options.native_parser and source:

Parallel checking on my machine crashes with just this change (so none of the changes in this PR were applied). Tracebacks are the same as those in e.g. https://github.com/python/mypy/actions/runs/25999007558/job/76418562048. Do you have any suggestions?

Oops, "parallel checking" would try to use the default parsernot work at all in that case, my bad. I'll look at this in more depth.

@ilevkivskyi
Copy link
Copy Markdown
Member

Hint: source is a required argument for parse(), which value do you think was (and still is) passed there for native parser, and how your change in ast_serialize will handle that?

@ilevkivskyi
Copy link
Copy Markdown
Member

Btw, I added some logging, and it looks like we sometimes pass a non-empty source to parse(), which means there may be a possibility for performance optimization. Ideally we should not read a file in Python unless absolutely necessary, since it is much faster in Rust.

@ilevkivskyi
Copy link
Copy Markdown
Member

Yeah, we eagerly read the file if there is only one file in the parse batch. Anyway, no need to fix it in this PR since this is a pre-existing problem, you can just fix the crash by passing an actual source (which should be None in most cases) instead of hard-coded "".

@github-actions
Copy link
Copy Markdown
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants