Skip to content

Serialization change from v1 to v2 for a custom type that subclasses stdlib dataclass. #11740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
al-dpopowich opened this issue Apr 11, 2025 · 7 comments
Closed
1 task done
Labels
bug V2 Bug related to Pydantic V2

Comments

@al-dpopowich
Copy link

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

I have an application with a custom type, Custom, that subclasses from a 3rd party class, which in turn is a stdlib dataclass. In v1, if I have a BaseModel that uses Custom as a field type and serialize an instance of the model, the value will be the instance of Custom. In v2, pydantic "sees" it is a dataclass and serializes it into a dict. This is breaking my application and I have yet to figure out how to turn off this undesired serialization.

Full example below, but briefly:

class ModelV1(v1.BaseModel):
   custom: Custom

assert isinstance(ModelV1(custom=Custom()).dict()['custom'], Custom)

class ModelV2(BaseModel):
   custom: Custom

assert isinstance(ModelV2(custom=Custom()).model_dump()['custom'], dict)

In V2, even though I have provided no serialization for Custom, pydantic sees that it is a dataclass and is autogenerating the serialization. If Custom was a regular class (i.e., inherited implicitly from object) it would return an instance, but subclassing a dataclass-decorated class, it serializes it to a dict. This is unexpected and breaking my app everywhere I'm expecting an instance, but finding a dict.

In my fully working demo illustrating the problem, you'll see in the output that:

  • the jsonschema are functionally the same btwn v1 and v2.
  • instances of models are functionally the same btwn v1 and v2.
  • but serializations are different: v1 returns an instance while v2 returns a dict

My question: How do I get v1 behavior when serializing such a model in v2?

Example Code

# stdlib imports
import dataclasses
import json

# venv imports
from pydantic import BaseModel, v1
from pydantic_core import core_schema

#   python 3.11.8
#   pydantic 2.11.2
#   pydantic_core 2.33.1


@dataclasses.dataclass
class SomeBase:
   """A base class that is a stdlib dataclass"""
   # NB: this is in a 3rd-party library I have no control over

   someattr:  str|None = None
   otherattr: str|None = None

class Custom(SomeBase):
   """An arbitrary complex object, subclassing SomeBase"""

   # This is used as a type for fields on pydantic V1 BaseModels

   # NB: this is greatly simplified for demo purposes; imagine a class
   # with many attributes, methods, complex validation, etc.

   def __init__(self, x):
      self.x = x

   @classmethod
   def __get_validators__(cls):
      # yield our validator
      yield cls.validate

   @classmethod
   def __modify_schema__(cls, schema):
      schema['const'] = 42

   @classmethod
   def validate(cls, val):
      # allow None
      if val is None:
         return None
      # otherwise, must be 42
      if val.x != 42:
         raise ValueError('This is not the answer to why we exist only to suffer'
                          ' backward incompatible upgrades')
      return val

class CustomV2(SomeBase):
   """This is the upgrade of Custom for Pydantic V2.x"""

   # This represents the upgrade of Custom for pydantic V2.  The only
   # changes between this class and Custom:
   #
   #   * removed: __get_validators__(), __modify_schema__()
   #   * replaced with: __get_pydantic_core_schema__(), __get_pydantic_json_schema__()

   def __init__(self, x):
      self.x = x

   @classmethod
   def __get_pydantic_core_schema__(cls, _source, _handler):
      """Override pydantic validation for our custom type"""
      return core_schema.no_info_plain_validator_function(
         cls.validate,
      )

   @classmethod
   def __get_pydantic_json_schema__(cls, _source, _handler):
      """Override pydantic validation for our custom type"""
      return dict(const=42)

   @classmethod
   def validate(cls, val):
      # allow None
      if val is None:
         return None
      # otherwise, must be 42
      if val.x != 42:
         raise ValueError('This is not the answer to why we exist only to suffer'
                          ' deprecation warnings')
      return val

class ModelV1(v1.BaseModel):
   """The V1 model that holds an instance of Custom"""
   name: str
   custom: Custom

class ModelV2(BaseModel):
   """V2 model"""
   name: str
   custom: CustomV2

# These are operationally the same
print('V1 jsonschema:\n', json.dumps(ModelV1.schema(), indent=2))
print('V2 jsonschema:\n', json.dumps(ModelV2.model_json_schema(), indent=2))

# The instances are equivalent
m1 = ModelV1(name='phred', custom=Custom(x=42))
m2 = ModelV2(name='phred', custom=CustomV2(x=42))

# The serialization to python dicts are different!!!
m1_asdict = m1.dict()
m2_asdict = m2.model_dump()

print('m1:', m1)
print('m1_asdict:', m1_asdict)
print("m1.dict()['custom'] is instance of SomeBase:", isinstance(m1_asdict['custom'], SomeBase))
print('----')
print('m2:', m2)
print('m2_asdict:', m2_asdict)
print("m2.model_dump()['custom'] is instance of SomeBase:", isinstance(m2_asdict['custom'], SomeBase))

Python, Pydantic & OS Version

pydantic version: 2.11.2
        pydantic-core version: 2.33.1
          pydantic-core build: profile=release pgo=false
                 install path: /opt/app/venv2/lib/python3.11/site-packages/pydantic
               python version: 3.11.8 (main, Mar 16 2024, 04:56:37) [GCC 13.2.1 20231014]
                     platform: Linux-5.4.0-169-generic-x86_64-with
             related packages: typing_extensions-4.12.2 pydantic-settings-2.8.1
                       commit: unknown
@al-dpopowich al-dpopowich added bug V2 Bug related to Pydantic V2 pending Is unconfirmed labels Apr 11, 2025
@Viicos
Copy link
Member

Viicos commented Apr 13, 2025

Your CustomV2 class has no serialization behavior defined. As such, Pydantic considers CustomV2 as being Any, and tries to guess the type of the instance to be serialized (e.g. if a field is annotated as Any and is instantiated with a datetime.date instance, Pydantic will guess the type as being a date and will serialize it as an ISO formatted string (in JSON mode)).

In your case, Pydantic does something similar and assumes m2.custom is a dataclass instance (this is performed in this function, copied from the dataclasses.is_dataclass() implementation).

However, this behavior is arguably not correct, depending on the context. This was also reported in python/cpython#119260. I think it is reasonable to exclude dataclass subclasses (which themselves aren't proper dataclasses) from this check, but I'll need to check with the team how disruptive the change is considering this is a breaking change, and as such we might have to wait for V3.

@Viicos Viicos removed the pending Is unconfirmed label Apr 13, 2025
@al-dpopowich
Copy link
Author

Thanks for the reply!

Your CustomV2 class has no serialization behavior defined. As such, Pydantic considers CustomV2 as being Any, and tries to guess the type of the instance to be serialized ...

I tried the following in a brazen attempt to fool pydantic v2, a serialization that returns the instance, but pydantic v2 sees through my plan and converts it to a dict anway...

   @classmethod
   def __get_pydantic_core_schema__(cls, _source, _handler):
      """Override pydantic validation for our custom type"""
      return core_schema.no_info_plain_validator_function(
         cls.validate,
         serialization=core_schema.plain_serializer_function_ser_schema(
            lambda o: o
         )
      )

So is there no way to say, do-not-serialize-this-instance, i.e., pretty-please with sugar on top, even if you think you can serialize it, don't and just return the object?

If not, this is too major a breaking change from v1 for me to upgrade. Throughout much of my REST view code I have this kind of pattern:

   async def post(self, mod: MyModel): 
      """Create an instance ..."""
      data = mod.dict()
      instance = await call_into_my_business_logic(**data)
      ...

With the above, in v1, I pass an instance of my custom type into my business logic while with v2 I end up passing in a dict. Code is exploding everywhere.

I'm happy to monkey patch if there's a known solution with current 2.x.

@Viicos
Copy link
Member

Viicos commented Apr 14, 2025

This is due to an unfortunate behavior, as even when using a serializer schema, Pydantic also applies the logic I described with Any (that is, it will try to guess the value type — unchanged because the serialization function is the identity — and so we end up with the same issue). This can normally be controlled by specifying a return_schema to the plain_serializer_function_ser_schema() call, but no serialization schema will fit here.

Actually, I think we can consider what I described in my previous comment as a bug. Custom isn't a dataclass (although it subclasses one), and so Pydantic should treat it as an arbitrary class and shouldn't assume it can serialize as a dict. So most probably we can fix this in 2.12.

@al-dpopowich
Copy link
Author

So, in pure python it would be:

def is_dataclass(obj):
   """Returns True if obj is a dataclass or an instance of a
   dataclass, but not a subclass or instance of a subclass.
   """
   # get the class, either obj itself, or its type
   cls = obj if isinstance(obj, type) else type(obj)
   # check the class' __dict__ for the sentinel attribute; can't use hasattr because it would check the mro
   return '__dataclass_fields__' in cls.__dict__

But we need this in rust, in the pydantic-core package? Is there any way to monkey patch with pure python? I'd like some way to move forward with my upgrade while waiting for an official patch.

@Viicos
Copy link
Member

Viicos commented Apr 14, 2025

Not the prettiest, but you can define the following property as a workaround:

class Custom(Base):
    ...

    @property
    def __dataclass_fields__(self):
        raise AttributeError

@al-dpopowich
Copy link
Author

No, not pretty, but I have to say, kinda genius! 😉

@Viicos
Copy link
Member

Viicos commented Apr 17, 2025

Closing in favor of #11773, with a simpler repro. We'll keep track of this issue for V3.

@Viicos Viicos closed this as completed Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2
Projects
None yet
Development

No branches or pull requests

2 participants