Blazon is a library for assuring data structure and format.
It is useful for both processing runtime data like from a web request, and also for type-hinting. It defines a general schema system that can be translated into multiple other systems, notably JSON Schema. It also includes tools to transform data, and soon, translate schemas between systems.
Project Status: This project is currently in a review stage. All comments are welcome, please leave them as issues on the github project.
Unlike most schema tools, Blazon's primary goal is to convert the data instead of merely validating it-- though it can do both. The idea is that usually we don't care whether the data coming in is correct, we just want it to be correct by the time we use it.
An analogy: I don't care about the shape of the cookie dough as it enters the cutter, I only care that it comes out as the right shape.
A simple example:
import blazon, json
with open("users.json") as o:
user_data = json.load(o)
user_schema = blazon.json.schema({
'properties': {
'name': {'type': 'string'},
'email': {'type': 'string', 'format': 'email'},
'age': {'type': 'number', 'minValue': 0, 'default': 42},
},
'required': ['name', 'email']
})
users = [user_schema(item) for item in user_data]
assert all(x['name'] for x in users)
assert all('@' in x['email'] for x in users)
assert all(x['age'] >= 0 for x in users)
Now we can be sure that all users have 3 properties:
- name: a string value
- age: a number value that is 0 or higher
- email: a string that is formatted like an email
blazon.json.schema()
takes any JSON Schema as its argument, and returns a Schema
object, which
is a callable that converts the data.
If user_data
does not conform, it will raise a ConstraintFailure, which is a subclass of ValueError:
>>> user_schema({'name': 'Beatrice'})
Traceback (most recent call last):
...
blazon.helpers.ConstraintFailure: required - must have the required keys: ['name', 'email']
Blazon tries to convert user_data
, even if it doesn't actually match the schema. This is fits
most use-cases, but if you are trying to specifically validate rather than convert, you can use validate()
:
>>> print( user_schema.validate({'name': 'Beatrice'}) )
SchemaValidationResult:Could not validate the instance against the schema.
Instance:
{'name': 'Beatrice'}
Errors:
required - must have the required keys: ['name', 'email']
The Schema.validate()
method returns a SchemaValidationResult
object which will evaluate as
truthy if, and only if, validation was successful. It also has information about each field or
constraint that failed.
Also you can simply query to see if it validates:
>>> if user_schema.validate({'name': 'Beatrice', 'email': 'beatrice@example.com'}):
... print("Valid!")
Valid!
Note: if your goal is simply JSON validation, and don't need the flexibility or conversion offered by Blazon, then fastjsonschema is around 2-5 times faster.
We can also do "partial" validation. Often, you want to represent a partial object: an object that doesn't have all of its fields, even if some are required. Whether that's from an update, a PATCH operation, or because you have a representation that is specifically omitting pieces that are memory intensive or too much to put on the wire. In most systems you have to represent this with a separate schema, and that causes all sorts of trouble and is just no fun.
It's real easy, we simply add partial=True
to our conversion or validation methods, and it simply
doesn't run validation with constraints like 'required'. Using the user_schema
from above:
>>> partial_user = user_schema({
'email': 'person@example.com',
'age': 24
}, partial=true)
And so it doesn't raise an error even though "name" is a required field.
Note: this design has a trade off that validated schemas can 'get through' so to speak. So it's good practice to name partially validated schemas as such, or otherwise track them through.
Schematics are a way of representing schemas as Python classes. They work like dataclasses, but they also seamlessly interact with an environment, e.g. JSON Schema, meaning you can easily represent a JSON schema as python classes. Since they are python types, they are useful for type-hinting.
An example:
from blazon import Schematic, field
from shortuuid import uuid
class Character(Schematic):
name: str
id : str = field(default_factory=uuid)
health: int = field(minimum=0, default=100)
tags: [str]
def damage(char: Character, amount : int):
char.health -= amount
bob = Character(name="Bob")
damage(bob, 10)
assert bob.health == 90
So this Character class now enforces specific properties. As one might gather, id
is a string,
with a default value that is a random shortuuid; name
is a required string, health
is an integer
that may not go below 0, will default to 100; and tags
is a list of strings, though it is not
required, since lists default to an empty list as their value.
This Character schematic can then be used normally, and is validated as used:
>>> char = Character()
ConstraintFailure: ...
>>> char = Character(name="Brenda", health=21)
Character(name="Brenda")
>>> char.health = -5
ConstraintFailure: ...
Schematics act a bit different from dataclasses to make them easier to work with. First, they don't need their required fields during input, they can be partials:
>>> char = Character(age=21)
Character(age=21)
>>> bool(char.validate())
False
>>> char.validate(partial=True)
True
They can also simply use any JSON Schema definition that you give them, for instance one that is in a yaml file.
# Monster.yaml
name: Monster
properties:
name:
type: string
level:
type: number
default: 1
from blazon import Schematic, json
class Monster(Schematic):
__schema__ = json.from_file('Monster.yaml')
kate_monster = Monster(name="Kate")
assert kate_monster.level == 1
You could likewise use a json file.
Finally, as you can see, you can directly assign the __schema__
to the Schematic, and we can
interact with it, the same:
>>> Character.__schema__
Schema({ "name": "Character", ... })
Blazon supports multiple "environments". Each environment can use different constraints, types, and
named schemas. Currently the only two environments out of the box are called blazon.json
and blazon.native
for using expressing JSON Schemas and a similar native python systems,
respectively.
Aspects that are tracked in environments:
- Named schemas: used when a schema is referenced by another
- Inflection: does the system use camel-case, underscores, etc.
- Primitive Types: e.g. int, string, array, object, etc.
- Constraints: The various constraints defined like 'required', 'minValue', 'properties', etc.
- Maps to other environment: To allow marshalling data and translating schemas from one environment to the next
The hope is to grow our environments to express many more systems, e.g. Postgres, AWS DynamoDB, Protocol Buffers, etc. Every schema system that can be distilled similarly as a set of a constraints should be able to be expressed in Blazon and that's when the fun begins.
If the systems can be expressed generally, we can pass not only data seamlessly between various systems, but also the schemas themselves. This will let us connect heterogeneous systems simply by mapping schemas and constraints from one to the next.
- Marshalling data
- Schema $ref resolution
- Generating JSON Schemas with $ref and other $special properties
- Type-hint plugins for mypy and others to treat the objects like dataclasses based on the schemas
- Schema translation
... I think that's it.
These advanced topics go into the various actions that are taken on data in Blazon. We try to codify the language here so that it's consistent.
Validation is the process of deciding whether a piece of data fits a schema.
Does the cookie match the cookie cutter?
Conversion is the process of taking data that fits one schema and making it fit another.
Cut the cookie dough out with the cookie cutter.
Often the original schema is undefined, like whatever the client sent you. The new schema is usually well defined and fits the system.
This can either be lossless or not (lossy), meaning we can lose data as it converts; in other words we might lose dough when we cut it.
Marshalling is a act of taking data from one environment to another, e.g. native to json.
Take the cookie and put it through a Play-Doh press.
This is also a conversion (one schema to another), but the schemas are isomorphic but exist within the context of two separate environments.
Creating mappings between schemas allows us to simply convert data between them. The schemas can also be in different environments, in which case we are also easily marshalling as well.
Pick a cookie cutter and a Play-Doh cutter and decide they make the same shape.
Schema translation is the act of moving a schema from one environment to the other.
Take the cookie cutter, make a corresponding Play-Doh press.
To do accurate schema translation, we need constraint mapping and type mapping between the environments. Since a schema can be represented by a set of constraints, mapping constraints from one environment to another means we can accurately translate schemas. And since the translated schema is isomorphic, it also means we can marshal data automatically.
Decide that the shape of the cookie cutter can be the same shape in the Play-Doh press.
Serialization is the act of changing the representation of an object into some serial form usually for transfer over a network or os device.
Look at the cookie, describe it in a text to your friend.
Often serialization is done after transformation, e.g. take a Python object, transform it to something JSON-like, then serialize it to actual JSON.
Tangent: A blazon in heraldry is a description of a design for a shield, crest, coat of arms, etc. By having a codified system, which produced texts like Azure, a bend or, or Party per pale argent and vert, a tree eradicated counterchanged, people could describe their designs and have them rendered at great distances or by artists that were not necessarily familiar with them. This was a way of serializing designs.
Type coercion is the act of indicating to a compiler or interpreter, that a variable is a different type than what it is. Really a statically typed language thing.
Let's pretend the cookie is a ninja throwing star.
Making an Adapter is the process of proxying or otherwise wrapping an object to look, structurally, like another type. This let's us act like it's one type, but really it's still the other. It's used much more in dynamically typed languages where we can do duck-typing.
Wrap the cookie in tinfoil so we can use it like a throwing star.
For our purposes, if we can create an adapter for a type, we can then apply constraints to it that were designed for a different type altogether.