Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First class support for Pydantic #2181

Open
patrick91 opened this issue Sep 19, 2022 · 18 comments
Open

First class support for Pydantic #2181

patrick91 opened this issue Sep 19, 2022 · 18 comments

Comments

@patrick91
Copy link
Member

patrick91 commented Sep 19, 2022

Hi folks :) I've been thinking that maybe at some point we should add first class support for Pydantic.

What does first class support mean?

Currently we support Pydantic via an, albeit experimental, integration, but the API is a cumbersome, as we need to create two classes, one for Pydantic and one for the Strawberry type.

Example:

class User(BaseModel):
    id: int
    name: str
    signup_ts: Optional[datetime] = None
    friends: List[int] = []

@strawberry.experimental.pydantic.type(model=User)
class UserType:
    id: strawberry.auto
    name: strawberry.auto
    friends: strawberry.auto

This great if you already have Pydantic models and want to add an API on top of them, and I still think this API makes a lot of sense for converting database models to a GraphQL API.

With first class support I mean having a way to define both the Strawberry type and the Pydantic model at once, like this:

@strawberry.type
class User(BaseModel):
    id: int
    name: str
    signup_ts: Optional[datetime] = None
    friends: List[int] = []

An API like this should make it much easier to define types that need to be validated (input types might need this often)

I don't know when we'll have time to work on this, I'd definitely work on improving the code for the integrations (making sure we reduce the code duplication we have now) first and we should also finalise the internal API too.

There's also question on how we can make sure we type this correctly, as we currently use a dataclass transform decorator to tell pyright what happens to the class. Maybe we introduce a new decorator just for pydantic?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@gazorby
Copy link
Contributor

gazorby commented Sep 21, 2022

Hi @patrick91

This is a great idea!

I implemented something similar in gazorby/strawberry-sqlalchemy-mapper , but for strawberry inputs.

The idea is to automatically handle data validation of query/mutation inputs and return any error raised by pydantic in dedicated strawberry error types.

Basically I needed:

  • A custom pydantic BaseModel that allow postponed validation so we can trigger when a resolver is called rather than during model instantiation
  • A decorator that hook strawberry.field to validate pydantic based inputs before passing them to the resolver

If ported to strawberry, usage would look like this:

  import strawberry
  from pydantic import BaseModel, validator
  
  # The input decorator actually return a new UserCreate type with the custom BaseModel (with postponed validation) as base
  @strawberry.input
  class UserCreate(BaseModel):
      username: str
      age: int
  
      @validator("age", check_fields=False)
      def check_age(cls, v):
          if v < 18:
              raise ValueError("Must be older than 18 years")
          return v
  
  @strawberry.type
  class Mutation:
      # Decorator that perform validation on pydantic model based inputs
      @strawberry.v_field
      # Note that return type must be a union with ValidationErrors as a possible type
      async def create_user(input: UserCreate) -> Union[User, ValidationErrors]:
        # Here input has been validated and can be safely manipulated
        pass

ValidationErrors type that match pydantic error structure:

import strawberry
from typing import List

@strawberry.type
class LocalizedErrorsModel:
    loc: List[str]
    msg: str

@strawberry.type
class ValidationErrorsModel:
    errors: List[LocalizedErrorsModel]

@Esras
Copy link

Esras commented Sep 21, 2022

👍 for first-class support.

I'm fairly new to this GraphQL thing and trying to integrate it with pydantic has been interesting. We have a number of classes that have a lot of dependencies between each other. The way that I'm currently trying to build it is by doing a class factory that generates the strawberry class for each of the pydantic classes I have, and trying to go from there. However, because of the typing system, I was getting errors on doing that.

When I otherwise tried making a series of classes that were the pydantic type, it felt very wordy to specify the full @strawberry.experimental.pydantic.type(name="Thing", model=schema.Thing, all_fields=True) for each class. I will note that this is also explicitly an API that returns functionally everything, so I don't have those specific requirements where I filter down what fields that I'm passing back to the user.

I did run into another issue which I can open as a separate bug (just trying to do a quick mind dump here), but when parsing a class:

class SomeClassName(OtherClass):
	type: Literal["some_field"] = "some_field"
	a_list: List[Annotated[str, KeyLink(AnotherClass)]] = Field(...)
	an_attribute: int = Field(...)

I get a failure that seems to trigger on the Literal. (And I think I had another one that failed on the List[Annotated, but I'll have to track that down.

Anyway, my point was, I'm excited about improvements here and happy to provide my use cases.

@helderco
Copy link

helderco commented Oct 7, 2022

👍 here too!

I need to serialize/deserialize resolver results and Pydantic is a great fit for that, but the current DX isn't good enough. My use case is that every model would be used for the API, I'm not exposing a subset of it. So that means creating the schema from Pydantic models instead of dataclasses naturally.

@XChikuX
Copy link
Contributor

XChikuX commented Nov 27, 2022

@gazorby Integration using the method you suggested would benefit me highly. Are you actively working on this?

Definitely need the first class support for pydantic !!

https://github.com/redis/redis-om-python might provide some insights on how to do this.

@thejaminator
Copy link
Contributor

thejaminator commented Feb 18, 2023

I have some initial work,
i think its possible if that pydantic type gets constrained to the functionality of what graphql can achieve. i.e. the same constraints that the strawberry dataclass has.

the main stumbling block is how to support @strawberry.field

@strawberry.experimental.pydantic.first_class_type()
class User(BaseModel):
    id: int
    name: str

    @strawberry.field
    def string(self) -> str:
        return str(self.id) + self.name
RuntimeError: no validator found for <class 'strawberry.field.StrawberryField'>, see `arbitrary_types_allowed` in Config

Because pydantic tries to find the validator for the strawberry field. so it'll error out. Now i can't hack @strawberry.experimental.pydantic.first_class_type() decorator to make it work, because the BaseModel MetaClass does the validation logic first. So the hack has to be done on @strawberry.field

@ppease
Copy link
Contributor

ppease commented Feb 18, 2023

Just wanted to mention that this would be a big lift for our groups as well. Right now we end up defining the pydantic types and the strawberry types which is painful to keep in sync.

@thejaminator
Copy link
Contributor

thejaminator commented Feb 18, 2023

Just wanted to mention that this would be a big lift for our groups as well. Right now we end up defining the pydantic types and the strawberry types which is painful to keep in sync.

@ppease
Could you provide an example where it is hard to keep in sync? I could provide some ways to ensure that / fix it in another way.

Because if you use the existing decorator with all_fields=True, then all the fields would get synced. Usually in my codebase sometimes I need to remove/add fields manually, which is when it doesn't become synced.

@ppease
Copy link
Contributor

ppease commented Feb 19, 2023

The main app that we have using pydantic with strawberry was an adaptation of some legacy data. We have a decent number of pydantic objects(~15 classes). Some of them have 10-15 fields on them. The issue that we ran into is that we have some enums where our keys and values don't match. By default I believe the strawberry.enum function takes the keys from a python enum so we had to create a new scalar type that could parse from the values or the keys. It was my understanding that any time you have to define something special about one of the fields you'll have to define all of them so we couldn't use all_fields=True. We do have several types where we can use all_fields=True, and generally I would agree the maintenance in that case is pretty painless. We also use strawberry.auto for the fields where we don't need to modify the type which allows us to not have to duplicate the type information for other fields.

This is doubly annoying because we also have to define input types for a lot of these which end up duplicating a lot of the same information. I would generalize this to say that any time you need to do something special on one field you can't use the all_fields=True, and there can be lots of reasons for this. Another example is where we wanted to add deprecation reasons for some of our fields.

Unrelated, I did have another question about whether the pydantic support is still considered experimental or not? We've been using it in production and it seems pretty solid. We were a little hesitant to be using something tagged as experimental.

@thejaminator
Copy link
Contributor

The main app that we have using pydantic with strawberry was an adaptation of some legacy data. We have a decent number of pydantic objects(~15 classes). Some of them have 10-15 fields on them. The issue that we ran into is that we have some enums where our keys and values don't match. By default I believe the strawberry.enum function takes the keys from a python enum so we had to create a new scalar type that could parse from the values or the keys. It was my understanding that any time you have to define something special about one of the fields you'll have to define all of them so we couldn't use all_fields=True. We do have several types where we can use all_fields=True, and generally I would agree the maintenance in that case is pretty painless. We also use strawberry.auto for the fields where we don't need to modify the type which allows us to not have to duplicate the type information for other fields.

This is doubly annoying because we also have to define input types for a lot of these which end up duplicating a lot of the same information. I would generalize this to say that any time you need to do something special on one field you can't use the all_fields=True, and there can be lots of reasons for this. Another example is where we wanted to add deprecation reasons for some of our fields.

Unrelated, I did have another question about whether the pydantic support is still considered experimental or not? We've been using it in production and it seems pretty solid. We were a little hesitant to be using something tagged as experimental.

Thanks @ppease. Understand the annoyance that strawberry.enum takes the values from the keys. Probably could be improved by allowing users to specify whether they want the enum constructed from the values instead in strawberry.enum.

Making pydantic first class won't help in this case. All it'll do is to allow pydantic models to act as strawberry dataclasses. So you'll still get the same limitation.

Another example is where we wanted to add deprecation reasons for some of our fields.

It should help in this case though.

@thejaminator
Copy link
Contributor

thejaminator commented Feb 20, 2023

@patrick91 I'm not very confident of making pydantic first class without a ton of future bugs. These are just some I foresee from my attempts to hack it out.

The problem is that pydantic and strawberry dataclasses have different contracts.

Defaults are hard

e.g.
Let's suppose we had this basemodel before

@strawberry.experimental.pydantic.first_class_type()
class User(BaseModel):
        id: int
        name: str
        books: List[str]

This can easily be a first class type.
Now, the user wants to deprecate the field books.

@strawberry.experimental.pydantic.first_class_type()
class DeprecatedUser(BaseModel):
        id: int
        name: str
        books: List[str] = strawberry.field(deprecation_reason="deprecate this field")

strawberry.field is provided as a default parameters to books. But the field itself doesn't have a default.
Its not like its defined as

strawberry.field(default=[], deprecation_reason="deprecate this field")

So previously this would error out correctly with pydantic ValidationError

User(id=1, name="name") # errors out correctly since the required field of books wasn't provided
DeprecatedUser(id=1, name="name") # doesn't error out since pydantic thinks it has a default

Now theres some hacks that can help it. Basically we can check if there is a default provided by the strawberry.field in the decorator, and patch it out if needed. But it still breaks any static analysis tool that will tell you that books is a required field. We could built our own pydantic plugin to support it, but its not going to work with pylance / native pycharm plugin. And it'll be a pain to maintain.

Function resolvers

def get_books() -> List[str]:
    return ["test great gatsby"]

@strawberry.experimental.pydantic.first_class_type()
class User(BaseModel):
    id: int
    name: str
    books: List[str] = strawberry.field(resolver=get_books)

The issue with providing functions in resolvers is when we call .dict(). This is pretty common as pydantic is a deserialization/serialization library.

What do we expect here? Do we expect

{"id": 1, "name": "name"}

or

{"id": 1, "name": "name", "books": StrawberryField....}

Probably the first example, because books is really more like a method on User. So it shouldn't be dumped to the dict.

We could hack it out such that the decorator makes it more like a method. But this too would break your static analysis tools. And it will also be different from the strawberry dataclass behavior that user. books returns you the StrawberryField, not the callable.

Another way is to force users to define function resolvers as a method instead.

@strawberry.experimental.pydantic.first_class_type()
class User(BaseModel):
    id: int
    name: str

    @strawberry.field()
    def books(self) -> List[str]:
        return get_books()

This is more reasonable. But you'll have the downside of differing behavior from the strawberry dataclass again. Since we won't support passing StrawberryFields-with-functions as defaults to the BaseModel.
(Also you'll need the @strawberry.field() decorator to not make the method become a StrawberrryField and just store that info somewhere else instead.)

@patrick91
Copy link
Member Author

Those are all valid points! Thanks for writing them down.

For the defaults, in the worst case we could hack and remove defaults from field that use strawberry.field. But it would be best to avoid it :)

I wonder if we can create a subclass of pydantic.Field, that allows to pass GraphQL metadata?
Maybe that would work with typing?


For resolvers, I think we should remove them from the serialization, like we do for strawberry.asdict. Resolver rely on params and info, so we can safely serialize their output :)

@thejaminator
Copy link
Contributor

thejaminator commented Feb 20, 2023 via email

@mortendaehli

This comment has been minimized.

@evan-hwang

This comment has been minimized.

@XChikuX
Copy link
Contributor

XChikuX commented Jan 12, 2024

Hi Everyone,

Please subscribe to this issue instead of writing out a comment saying you agree to the idea.

If you were to take specific interest in this issue and offer to work on it on your spare time that would be different, and is of course encouraged.

Managing open source is hard as it is. More notifications for patrick or thejaminator isn't going to help in the long run.

Sincerely,
Srikanth

@liuliuOD
Copy link
Contributor

Hello @XChikuX,

I'm interested in contributing to this feature. How can I get involved? Perhaps I can contribute to the ongoing discussions or provide some input. Let me know how I can be part of this effort.

Best regards,
Liuliu

@Mark90
Copy link

Mark90 commented Feb 6, 2024

If this were to happen, would (backwards) compatibility with dataclasses be a requirement?

@patrick91
Copy link
Member Author

@liuliuOD join our discord, we'll start community calls soon (the first one is tomorrow) so we can chat there 😊

@Mark90 I'd say yes, strawberry.type will keep using dataclasses, unless you're decorating a Pydantic model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests