Flatten nested Pydantic model

Question

from typing import Union
from pydantic import BaseModel, Field


class Category(BaseModel):
    name: str = Field(alias="name")


class OrderItems(BaseModel):
    name: str = Field(alias="name")
    category: Category = Field(alias="category")
    unit: Union[str, None] = Field(alias="unit")
    quantity: int = Field(alias="quantity")

When instantiated like this:

OrderItems(**{'name': 'Test','category':{'name': 'Test Cat'}, 'unit': 'kg', 'quantity': 10})

It returns data like this:

OrderItems(name='Test', category=Category(name='Test Cat'), unit='kg', quantity=10)

But I want the output like this:

OrderItems(name='Test', category='Test Cat', unit='kg', quantity=10)

How can I achieve this?

cursor = order_collection.find() return [OrderItems(**item) async for item in cursor] — Russell
– Russell, Commented Jan 30, 2023 at 19:30
What do you need the Category model for then? Why not just define category: str and initialize it with the value "Test Cat" right away? — Daniel Fainberg
– Daniel Fainberg, Commented Jan 30, 2023 at 19:57
Then you should probably have a different model for inserting than the one you use for responding. That is the usual FastAPI practice anyway. In the response model you could define category: str and use a a regular validator with pre=True to handle the dictionary. — Daniel Fainberg
– Daniel Fainberg, Commented Jan 30, 2023 at 20:08

Daniel Fainberg · Accepted Answer · 2023-02-04 12:33:43Z

11

You should try as much as possible to define your schema the way you actually want the data to look in the end, not the way you might receive it from somewhere else.

UPDATE: Generalized solution (one nested field or more)

To generalize this problem, let's assume you have the following models:

from pydantic import BaseModel


class Foo(BaseModel):
    x: bool
    y: str
    z: int


class _BarBase(BaseModel):
    a: str
    b: float

    class Config:
        orm_mode = True


class BarNested(_BarBase):
    foo: Foo


class BarFlat(_BarBase):
    foo_x: bool
    foo_y: str

Problem: You want to be able to initialize BarFlat with a foo argument just like BarNested, but the data to end up in the flat schema, wherein the fields foo_x and foo_y correspond to x and y on the Foo model (and you are not interested in z).

Solution: Define a custom root_validator with pre=True that checks if a foo key/attribute is present in the data. If it is, it validates the corresponding object against the Foo model, grabs its x and y values and then uses them to extend the given data with foo_x and foo_y keys:

from pydantic import BaseModel, root_validator
from pydantic.utils import GetterDict

...

class BarFlat(_BarBase):
    foo_x: bool
    foo_y: str

    @root_validator(pre=True)
    def flatten_foo(cls, values: GetterDict) -> GetterDict | dict[str, object]:
        foo = values.get("foo")
        if foo is None:
            return values
        # Assume `foo` must ba valid `Foo` data:
        foo = Foo.validate(foo)
        return {
            "foo_x": foo.x,
            "foo_y": foo.y,
        } | dict(values)

Note that we need to be a bit more careful inside a root validator with pre=True because the values are always passed in the form of a GetterDict, which is an immutable mapping-like object. So we cannot simply assign new values foo_x/foo_y to it like we would to a dictionary. But nothing is stopping us from returning the cleaned up data in the form of a regular old dict.

To demonstrate, we can throw some test data at it:

test_dict = {"a": "spam", "b": 3.14, "foo": {"x": True, "y": ".", "z": 0}}
test_orm = BarNested(a="eggs", b=-1, foo=Foo(x=False, y="..", z=1))
test_flat = '{"a": "beans", "b": 0, "foo_x": true, "foo_y": ""}'
bar1 = BarFlat.parse_obj(test_dict)
bar2 = BarFlat.from_orm(test_orm)
bar3 = BarFlat.parse_raw(test_flat)
print(bar1.json(indent=4))
print(bar2.json(indent=4))
print(bar3.json(indent=4))

The output:

{
    "a": "spam",
    "b": 3.14,
    "foo_x": true,
    "foo_y": "."
}

{
    "a": "eggs",
    "b": -1.0,
    "foo_x": false,
    "foo_y": ".."
}

{
    "a": "beans",
    "b": 0.0,
    "foo_x": true,
    "foo_y": ""
}

The first example simulates a common situation, where the data is passed to us in the form of a nested dictionary. The second example is the typical database ORM object situation, where BarNested represents the schema we find in a database. The third is just to show that we can still correctly initialize BarFlat without a foo argument.

One caveat to note is that the validator does not get rid of the foo key, if it finds it in the values. If your model is configured with Extra.forbid that will lead to an error. In that case, you'll just need to have an extra line, where you coerce the original GetterDict to a dict first, then pop the "foo" key instead of getting it.

Original post (flatten single field)

If you need the nested Category model for database insertion, but you want a "flat" order model with category being just a string in the response, you should split that up into two separate models.

Then in the response model you can define a custom validator with pre=True to handle the case when you attempt to initialize it providing an instance of Category or a dict for category.

Here is what I suggest:

from pydantic import BaseModel, validator


class Category(BaseModel):
    name: str


class OrderItemBase(BaseModel):
    name: str
    unit: str | None
    quantity: int


class OrderItemCreate(OrderItemBase):
    category: Category


class OrderItemResponse(OrderItemBase):
    category: str

    @validator("category", pre=True)
    def handle_category_model(cls, v: object) -> object:
        if isinstance(v, Category):
            return v.name
        if isinstance(v, dict) and "name" in v:
            return v["name"]
        return v

Here is a demo:

if __name__ == "__main__":
    insert_data = '{"name": "foo", "category": {"name": "bar"}, "quantity": 1}'
    insert_obj = OrderItemCreate.parse_raw(insert_data)
    print(insert_obj.json(indent=2))
    ...  # insert into DB
    response_obj = OrderItemResponse.parse_obj(insert_obj.dict())
    print(response_obj.json(indent=2))

Here is the output:

{
  "name": "foo",
  "unit": null,
  "quantity": 1,
  "category": {
    "name": "bar"
  }
}

{
  "name": "foo",
  "unit": null,
  "quantity": 1,
  "category": "bar"
}

One of the benefits of this approach is that the JSON Schema stays consistent with what you have on the model. If you use this in FastAPI that means the swagger documentation will actually reflect what the consumer of that endpoint receives. You could of course override and customize schema creation, but... why? Just define the model correctly in the first place and avoid headache in the future.

edited Feb 4, 2023 at 12:33

answered Jan 30, 2023 at 20:20

Daniel Fainberg

20.1k3 gold badges43 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Vetsin Over a year ago

much more sane than my answer

Russell Over a year ago

@daniil-fajnberg without pre it also works fine. I think I need without pre. Request need to validate as pydantic model

MrNetherlands Over a year ago

@Daniil Fjanberg, very nice! However, how could this work if you would like to flatten two additional attributes from the Category class, (e.g. attr1 and attr2). How would the response model look like and do we need separate validators for each attribute (so one for category (name), attr1 and attr2?

Daniel Fainberg Over a year ago

@MrNetherlands As I said, you should define how you want your response model to look. Once you do that, the rest becomes just a matter of "how can we make our data look like this?" If you had Category with attr1 and attr2 and wanted to have a response with fields like category_attr1 and category_attr2 for example, you could define a root_validator to grab the category object and assign its attribute values accordingly.

Daniel Fainberg Over a year ago

@MrNetherlands Yes, you are right, that needs to be handled a bit differently than with a regular dict. This is not documented (because Pydantic v2 comes soon anyway). I updated my answer to present a more generalized solution first.

|

ajanss · Accepted Answer · 2024-05-01 23:36:00Z

4

You can now use AliasPath to access nested fields (pydantic>=2.0):

from pydantic import BaseModel, Field, AliasPath

class FooFlat(BaseModel):
    a: str
    b: float
    foo_x: bool = Field(validation_alias=AliasPath("foo", "x"))
    foo_y: str = Field(validation_alias=AliasPath("foo", "y"))
    foo_z: int = Field(validation_alias=AliasPath("foo", "z"))

data = {"a": "spam", "b": 3.14, "foo": {"x": True, "y": ".", "z": 0}}
print(FooFlat(**data))  # FooFlat(a='spam', b=3.14, x=True, y='.', z=0)

answered May 1, 2024 at 23:36

ajanss

3493 silver badges4 bronze badges

Comments

ikreb · Accepted Answer · 2024-02-05 13:21:53Z

If you need a variant which supports pydantic>=2.0. I use the solution by @Daniil Fajnberg.

from pydantic import BaseModel, model_validator
from pydantic.v1.utils import GetterDict

class Foo(BaseModel):
    x: bool
    y: str
    z: int

class _BarBase(BaseModel):
    a: str
    b: float

    class Config:
        from_attributes = True

class BarNested(_BarBase):
    foo: Foo

class BarFlat(_BarBase):
    foo_x: bool
    foo_y: str

    @model_validator(mode="before")
    def flatten_foo(cls, values: GetterDict):
        foo = values.get("foo")
        if foo is None:
            return values
        foo = Foo.model_validate(foo)

        result = {
            "foo_x": foo.x,
            "foo_y": foo.y,
        }
        result.update(values)
        return result

data = {"a": "spam", "b": 3.14, "foo": {"x": True, "y": ".", "z": 0}}

print(BarFlat(**data))

Vetsin · Accepted Answer · 2023-01-30 19:51:27Z

1

Well, i was curious, so here's the insane way:

class Category(BaseModel):
    name: str = Field(alias="name")


class OrderItems(BaseModel):
    name: str = Field(alias="name")
    category: Category = Field(alias="category")
    unit: Union[str, None] = Field(alias="unit")
    quantity: int = Field(alias="quantity")
    
    def json(self, *args, **kwargs) -> str:
        self.__dict__.update({'category': self.__dict__['category'].name})
        return super().json(*args, **kwargs)
    
c = Category(name='Dranks')
m = OrderItems(name='sodie', category=c, unit='can', quantity=1)
m.json()

And you get:

'{"name": "sodie", "category": "Dranks", "unit": "can", "quantity": 1}'

The sane way would probably be:

class Category(BaseModel):
    name: str = Field(alias="name")


class OrderItems(BaseModel):
    name: str = Field(alias="name")
    category: Category = Field(alias="category")
    unit: Union[str, None] = Field(alias="unit")
    quantity: int = Field(alias="quantity")
    
c = Category(name='Dranks')
m = OrderItems(name='sodie', category=c, unit='can', quantity=1)

r = m.dict()
r['category'] = r['category']['name']

answered Jan 30, 2023 at 19:51

Vetsin

2,8411 gold badge25 silver badges29 bronze badges

3 Comments

Russell Over a year ago

Your first way is nice. I already using this way. I was finding any better way like built in method to achieve this type of output. I also tried for root_validator

Vetsin Over a year ago

The only other 'option' i saw was maybe using github.com/Maydmor/pydantic-computed that is just the name, and exclude=True the actual category field.

Daniel Fainberg Over a year ago

The first is a very bad idea for a multitude of reasons. You are circumventing a lot of inner machinery that makes Pydantic models useful by going directly via __dict__, you are destroying the purpose of the type annotation, you are setting data that will not conform to the JSON schema returned by the model, .... The second is viable of course, but I wonder why you would not just override the dict method to do that and then override json to call dict.

jjislam · Accepted Answer · 2023-01-30 19:33:41Z

0

Try this when instantiating:

myCategory = Category(name="test cat")
OrderItems(
    name="test",
    category=myCategory.name,
    unit="kg",
    quantity=10)

answered Jan 30, 2023 at 19:33

jjislam

5613 silver badges8 bronze badges

Collectives™ on Stack Overflow

Flatten nested Pydantic model

5 Answers 5

UPDATE: Generalized solution (one nested field or more)

Original post (flatten single field)

6 Comments

Comments

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

UPDATE: Generalized solution (one nested field or more)

Original post (flatten single field)

6 Comments

Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related