Skip to main content

Dagster & Patito

Patito is a data validation framework for Polars, based on Pydantic.

For more information on how to use Dagster with Polars, see dagster-polars documentation.

Installation

pip install 'dagster-polars[patito]'

Usage

dagster-polars automatically recognizes Patito type annotations, performs data validation using the specified model, and infers table metadata such as column constraints, data types, and descriptions.

import dagster as dg
import patito as pt
import polars as pl

class User(pt.Model):
uid: str = pt.Field(unique=True, description="User ID")
age: int | None = pt.Field(gt=18, description="User age")


@dg.asset(io_manager_key="polars_parquet_io_manager")
def my_asset() -> User.DataFrame:
my_data = ...
return User.DataFrame(my_data)

The specified User model will be used to validate the data returned by the my_asset asset. If the data does not conform to the model, an error will be raised.

Upstream assets can also be annotated with Patito models, ensuring that the input data is always validated.

Alternatively, the standalone dagster_polars.patito.patito_model_to_dagster_type function can be used to build a dagster type for a given Patito model, which can then be used with any IOManager. In this case, the type annotation can be a normal pl.DataFrame.

from dagster_polars.patito import patito_model_to_dagster_type

user_type = patito_model_to_dagster_type(User)


@dg.asset(io_manager_key="polars_parquet_io_manager", dagster_type=user_type)
def my_asset() -> pl.DataFrame:
my_data = ...
return pl.DataFrame(my_data) # <- you better behave, mr. data!

The same dagster type can be used when loading data into downstream assets, ensuring that the data is always validated.