diff --git a/peps/pep-0834.rst b/peps/pep-0834.rst new file mode 100644 index 00000000000..e7a48db4925 --- /dev/null +++ b/peps/pep-0834.rst @@ -0,0 +1,615 @@ +PEP: 834 +Title: Class Builders +Author: Jelle Zijlstra +Discussions-To: Pending +Status: Draft +Type: Standards Track +Created: 16-May-2026 +Python-Version: 3.16 +Post-History: Pending + + +Abstract +======== + +This PEP proposes a new syntax for declaring class-like constructs:: + + builder C: + ... + +When this syntax is used, the name ``builder`` is looked up in the +current scope and its ``__build_class__`` attribute is used to build an arbitrary object. + +Intended use cases include enums, dataclasses, and several typing constructs:: + + from dataclasses import dataclass + from enum import enum + from typing import namedtuple, protocol, typed_dict + + dataclass InventoryItem(slots=True): + name: str + amount: int + + enum Color: + red = 1 + green = 2 + blue = 3 + + typed_dict Movie(closed=True): + name: str + year: int + + protocol HasClose: + def supports_close(self) -> None: ... + + namedtuple Employee: + name: str + title: str + +This provides more intuitive syntax for beginners and +improved flexibility in the implementation. + + +Motivation +========== + +The Python standard library contains several constructs that create what may be called +a "class with benefits": something that is created through a class statement, but that +has some special behavior that is unlike a normal class. The :py:func:`dataclasses.dataclass` +decorator injects various methods into the class; the :py:class:`enum.Enum` base class +transforms a class into an enum; and the :py:mod:`typing` module provides mechanisms to +build protocols, named tuples, and typed dictionaries. + +The current syntax is verbose and not intuitive for beginners. It requires users to +understand that certain base classes or decorators radically change what a class statement +does, instead of putting the special behavior of these constructs front and center in the +syntax. + +This PEP proposes a flexible new syntax to declare these constructs using *class builders*:: + + builder Simple: + ... + + builder Complex[T](Base, key="value"): + ... + +In this syntax, the name ``builder`` is looked up in the current scope and its ``__build_class__`` +attribute is called with the name, bases, body, and constructor keyword arguments used in the definition. +This function may return an arbitrary object, which is then stored at the name provided in the definition. +The standard library will be changed to provide builders for dataclasses, enums, :py:class:`typing.Protocol`, +:py:class:`typing.NamedTuple`, and :py:class:`typing.TypedDict`. + +In addition to improved concision, the new syntax allows more flexibility. The ``slots=True`` version +of dataclasses is currently implemented by creating a new wrapper class replacing the original class. +This can cause subtle differences between the class initially created by the class statement and the +class eventually bound to the class name. The dataclass class builder can bypass this problem by +injecting the slots definition directly into the class body. + + +Specification +============= + +Syntax +------ + +The grammar for class definitions is extended to allow a name before the +class name: + +.. code-block:: peg + + + class_def_raw: + | 'class' NAME type_params? ['(' arguments? ')'] ':' block + | NAME NAME type_params? ['(' arguments? ')'] ':' block + +The first ``NAME`` in the second alternative is the *class builder*. The +second ``NAME`` is the name bound by the statement. The type parameter list, +parenthesized arguments, and block have the same syntax as in an ordinary +class definition. + +For example, all of the following are syntactically valid class builder +definitions:: + + builder C: + ... + + builder C[T](Base[T], key=value): + ... + + type C: + ... + +The builder name is parsed as a normal name token, not as a new keyword. In +particular, existing and future soft keywords may be used as builder names +when they appear in this syntactic position. Hard keywords may not be used as +builder names. + +The ``match`` soft keyword remains special only to the extent already required +by the grammar for ``match`` statements. A statement of the form:: + + match subject: + case pattern: + ... + +continues to be parsed as a ``match`` statement. A statement such as:: + + match C: + ... + +where the indented block is not a sequence of ``case`` clauses, is parsed as a +class builder definition using the builder named ``match``. + +Decorator syntax is supported in the same way as for ordinary class +definitions:: + + @decorator + builder C: + ... + +The builder itself must be a bare name. Attribute references and arbitrary +expressions are not part of this proposal; for example, +``dataclasses.dataclass C:`` is a syntax error. + + +Runtime semantics +----------------- + +A class builder definition evaluates the builder name in the surrounding +scope, retrieves its ``__build_class__`` attribute, and calls that attribute +using a calling convention modeled on :py:func:`!builtins.__build_class__`. + +The statement:: + + builder C(Base, key=value): + body + +is approximately equivalent to:: + + _build = builder.__build_class__ + + def C(): + __module__ = __name__ + __qualname__ = "C" + body + + C = _build(C, "C", Base, key=value) + +This pseudocode is explanatory only. As with ordinary class definitions, the +body is compiled as a class body, not as an ordinary Python function body, and +the exact handling of ``__module__``, ``__qualname__``, ``__classcell__``, +annotations, static attributes, and related implementation details follows the +existing class-definition machinery. + +The builder's ``__build_class__`` attribute is called with these arguments: + +* the class body function; +* the name being bound, as a string; +* all positional arguments supplied in parentheses after the name; +* all keyword arguments supplied in parentheses after the name. + +The return value of this call is the value bound to the class name, after +applying any decorators. The returned object need not be a class. + +For example, this builder delegates directly to the normal class creation +machinery:: + + import builtins + + class Builder: + def __build_class__(self, func, name, *bases, **kwds): + return builtins.__build_class__(func, name, *bases, **kwds) + + builder = Builder() + + builder C: + x = 1 + +After executing this code, ``C`` is an ordinary class. + +If the builder object has no ``__build_class__`` attribute, the statement +raises :py:exc:`AttributeError` at runtime. Exceptions raised while evaluating +the builder name, retrieving ``__build_class__``, evaluating bases or keyword +arguments, executing the body, or applying decorators propagate normally. + + +Order of evaluation +------------------- + +Class builder definitions follow the same broad evaluation order as ordinary +class definitions, with the builder lookup replacing the lookup of +:py:func:`!builtins.__build_class__`. + +For a builder definition, the order is: + +1. Evaluate decorators, if any, from top to bottom. +2. Evaluate the builder name and retrieve its ``__build_class__`` attribute. +3. If the definition has type parameters, enter the synthetic type-parameter + scope and create the type parameter objects. +4. Create the class body function. +5. Evaluate the bases and keyword arguments from left to right. For a + generic builder definition, these expressions are evaluated in the + type-parameter scope, so they may refer to the type parameters. +6. Call the builder's ``__build_class__`` attribute. +7. Apply decorators, from bottom to top. +8. Bind the resulting object to the class name in the current scope. + +As with ordinary class definitions, the exact interleaving of creating the +class body function and evaluating base expressions is an implementation +detail, except where it is observable through the ordering above. + +The builder lookup in step 2 is deliberately outside the type-parameter +scope. Type parameters are therefore not visible to the builder name lookup, +but they are visible to base expressions and to the class body. + + +Generic class builder definitions +--------------------------------- + +Class builder definitions may use the type parameter syntax introduced by +:pep:`695`:: + + builder C[T](Base[T]): + item: T + +The type parameters are created using the same runtime machinery as for +ordinary generic classes. The class body receives ``__type_params__`` in its +namespace, as it does for an ordinary generic class. + +The builder expression is evaluated before entering the synthetic +type-parameter scope. This means that type parameters are not visible to the +builder lookup, even though they are visible to base expressions and to the +class body. + +This rule prevents a type parameter from shadowing the builder. For example:: + + dataclass C[dataclass]: + value: dataclass + +In this example, the builder is the ``dataclass`` object from the surrounding +scope. The type parameter also named ``dataclass`` is visible in the class +body and may be used as an annotation, but it does not affect which builder is +called. + +This is consistent with the role of the builder name: it selects the mechanism +used to construct the definition. Type parameters parameterize the definition +being constructed; they do not participate in selecting the builder. + +For explanatory purposes, the runtime behavior is similar to:: + + _build = builder.__build_class__ + + def _generic_parameters_of_C(_build): + T = TypeVar("T") + _type_params = (T,) + + def C(): + __type_params__ = _type_params + item: T + + return _build(C, "C", Generic[T]) + + C = _generic_parameters_of_C(_build) + +Again, this pseudocode is not an exact source transformation. In particular, +the actual implementation does not expose the temporary names shown here, +and the scoping rules are slightly different, as specified in :pep:`695`. + +If the builder returns a class-like object by delegating to +:py:func:`!builtins.__build_class__`, ``__type_params__`` will normally become +an attribute on the resulting class because it was present in the class +namespace. If the builder returns some other object, Python does not add +``__type_params__`` to that object after the builder returns. A builder that +returns a non-class object is responsible for preserving any information from +the body that it wants to expose. + + +Interaction with metaclasses +---------------------------- + +Class builder definitions do not directly invoke the normal metaclass +selection algorithm. The ``metaclass`` keyword argument in the definition +is not special; it is passed to the builder as an ordinary keyword argument. +A builder that delegates to :py:func:`!builtins.__build_class__` receives the +same metaclass behavior as an ordinary class definition with the same bases +and keywords. + +For example:: + + class Builder: + def __build_class__(self, func, name, *bases, **kwds): + return builtins.__build_class__(func, name, *bases, **kwds) + + builder C(Base, metaclass=Meta): + ... + +In this case ``Meta`` is handled by :py:func:`!builtins.__build_class__` in the +usual way. A builder may instead interpret ``metaclass`` or other keywords +itself, pass them through, reject them, or ignore them. + + +Decorators +---------- + +Decorators on class builder definitions behave like decorators on ordinary +class definitions. They are evaluated before the builder call and applied to +the object returned by the builder. + +The statement:: + + @decorator1 + @decorator2 + builder C: + ... + +is approximately equivalent to:: + + C = decorator1(decorator2(builder.__build_class__(...))) + +The decorators operate on the builder's return value. If the builder returns +a non-class object, the decorators receive that non-class object. + + +AST +--- + +The :py:class:`ast.ClassDef` node gains a new field, ``builder``. The field +is either ``None`` for ordinary class definitions or an :py:class:`ast.Name` +node in load context for class builder definitions. + +For example, parsing:: + + dataclass C: + pass + +produces an ``ast.ClassDef`` node with ``name == "C"`` and +``builder == ast.Name(id="dataclass", ctx=ast.Load())``. + +The order of fields on :py:class:`ast.ClassDef` becomes:: + + name + bases + keywords + body + decorator_list + type_params + builder + +The :py:func:`compile` function rejects an AST whose ``builder`` field is +neither ``None`` nor an expression valid in load context. + + +Standard library changes +------------------------ + +The following class builders will be added to the standard library: + +* ``enum.enum``: creates an enum class. +* ``dataclasses.dataclass``: creates a dataclass. +* ``typing.protocol``: creates a protocol. +* ``typing.namedtuple``: creates a typed named tuple. +* ``typing.typed_dict``: creates a typed dictionary. + +The :py:mod:`types` module gains a helper function +``exec_class_body(func, ns)``. The first argument is a class body function +such as the one passed to a builder's ``__build_class__`` method. The second +argument is the namespace mapping into which the body should be executed. + +The helper executes the body using the function's globals and closure, and +uses ``ns`` as the class namespace. It performs the part of +:py:func:`!builtins.__build_class__` that runs the body, but does not select a +metaclass or create the final class object. It returns ``None``; information +produced by executing the body is communicated by mutating ``ns``. + + +Type checker behavior +--------------------- + +The flexibility of class builders means that it is difficult to check them in full generality. + +Python type checkers should recognize the standard library builders and treat them similarly to the +existing syntax. In other cases, type checkers should look up the ``__build_class__`` attribute on +the builder and type check the call. + +A ``__build_class__`` callable may be decorated with the :py:func:`typing.dataclass_transform` decorator, +indicating that the builder behaves similarly to a dataclass. + + +Examples +======== + +A class builder that behaves exactly like an ordinary class statement can +delegate to :py:func:`!builtins.__build_class__`:: + + import builtins + + class PlainBuilder: + def __build_class__(self, func, name, *bases, **kwds): + print("Creating class", name) + return builtins.__build_class__(func, name, *bases, **kwds) + + plain = PlainBuilder() + + plain C: # prints "Creating class C" + x = 1 + +Builders that need to inspect the namespace should use +``types.exec_class_body`` together with :py:func:`types.new_class`:: + + import types + + class RecordingBuilder: + def __build_class__(self, func, name, *bases, **kwds): + captured = {} + + def exec_body(ns): + types.exec_class_body(func, ns) + captured.update(ns) + + cls = types.new_class(name, bases, kwds, exec_body) + cls.captured_namespace = captured + return cls + + recording = RecordingBuilder() + + recording C: + x = 1 + + print(C.captured_namespace) # {"__module__": "__main__", "__qualname__": "C", ...} + +Builders that return non-class objects may execute the body into an ordinary +mapping and construct any object they choose from the resulting namespace:: + + import annotationlib + import types + + class Schema: ... + + class SchemaBuilder: + def __build_class__(self, func, name, *bases, **kwds): + ns = {} + types.exec_class_body(func, ns) + annotate = annotationlib.get_annotate_from_class_namespace(ns) + return Schema(name, annotate, ns, **kwds) + + schema = SchemaBuilder() + + schema Movie: + title: str + year: int + +Builders that need to preserve annotations from a class namespace should use +``annotationlib.get_annotate_from_class_namespace`` rather than accessing +implementation-specific namespace entries directly. + +Some builders need to mutate the namespace after the body has executed but +before the class object is created. The following builder creates slotted +classes by reading the annotations written by the class body and inserting a +``__slots__`` tuple before calling the metaclass:: + + import annotationlib + import types + + class SlottedBuilder: + def __build_class__(self, func, name, *bases, **kwds): + def exec_body(ns): + types.exec_class_body(func, ns) + annotate = annotationlib.get_annotate_from_class_namespace(ns) + annotations = annotationlib.call_annotate_function( + annotate, annotationlib.Format.STRING) + ns["__slots__"] = tuple(annotations) + + return types.new_class(name, bases, kwds, exec_body) + + slotted = SlottedBuilder() + + slotted C: + a: int + + print(C.__slots__) # ("a",) + +This example deliberately omits details that a production-quality slotted +dataclass builder would need, such as inherited slots and class-level field +defaults. + + +Rationale +========= + +Python has various constructs that are somewhat like classes, but behave +subtly (or not so subtly!) differently. This PEP proposes a generic, flexible +mechanism for defining such constructs. + +An alternative could be to add specific syntax for some or all of the constructs +for which this PEP proposes to use class builders. For example, ``protocol`` could +be made a soft keyword, allowing protocols to be written as proposed in this PEP, +but without an import and without a more powerful new language feature. + +However, this would unduly privilege the standard library. There are use +cases in third-party frameworks that could be helped by builder syntax, including +alternative dataclass-like frameworks, ORMs, or DSLs. + + +Backwards Compatibility +======================= + +This proposal adds new syntax that is currently invalid, so it does not break +existing code. + + +Security Implications +===================== + +This feature does not introduce any new attack surface. Class builders can execute +arbitrary code, but this is already true for pre-existing class definitions: an +attacker-controlled metaclass, base class, or decorator can also execute arbitrary code. + + +How to Teach This +================= + +I recommend that teachers introduce concepts such as dataclasses, enums, +and protocols using the new class builder syntax. This allows students to +learn the new concepts without needing to understand the more complex machinery +of decorators and metaclasses. + +The general class builder concept is a more advanced topic that can be introduced +along with other metaprogramming techniques such as metaclasses. + + +Reference Implementation +======================== + +A prototype implementation exists in a `draft PR `__. + + +Rejected Ideas +============== + +Allowing arbitrary builder expressions +-------------------------------------- + +The PEP does not allow syntax such as ``dataclasses.dataclass C:`` or +``factory() C:``. Such forms are more flexible, but they make the grammar and +visual shape of the feature less clear. A module attribute or computed +builder can be assigned to a local name before use. + + +Adding a separate AST node +-------------------------- + +The PEP uses a new ``builder`` field on :py:class:`ast.ClassDef` rather than a +new AST node. Builder definitions share the same name, bases, keywords, body, +decorators, and type-parameter structure as ordinary class definitions. + + +Adding a ``__prepare__`` hook to builders +----------------------------------------- + +The PEP does not add a separate builder-level ``__prepare__`` hook. Builders +that need to control the namespace can use :py:func:`types.new_class` and +``types.exec_class_body``. + + +Open Issues +=========== + +* What should the shape of the AST be? A new field on ``ClassDef``, or a new ``BuilderDef`` node? +* Should we allow builders to be something other than bare names? ``dataclasses.dataclass C:``? +* Should there be a ``__prepare__`` hook? +* Should the ``dataclass_transform`` behavior have any enhancements? Some way to inject a base class or metaclass? +* For enums, do we need more variants for flags and intenums etc.? +* Could we usefully make protocol and typeddict definitions using builders lazy, where it does not evaluate the + class body until we need it at runtime? + + +Acknowledgements +================ + +I thank all the people at PyCon US who humored me when I started talking about this idea. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.