Ndifreke Ekott

Thoughts, stories, ideas and programming

18 Mar 2025

Enforcing JSON Schema in Django JSONField

I think one of the best innovations in the Relational database space is the introduction of the JSON datatype. Before the introduction of JSON columns, we had to turn to NoSQL solutions like MongoDB to store document structured data. So you end up with managing two databases. And since we build a lot of Rest APIs and our response data is standardised around JSON, it is highly beneficial to have a single database that supports multiple data representations.

I am a fan of reducing a lot of table joins and present some information as values stored in a JSON column. The benefit of this is the fact that a single query can get you all the data you need. Since we most likely will be working with Object Relational Mappers, we can reduce yet another N+1 queries to fetch children of parent objects.

One concern with using JSON fields in relational databases, is the fact they donโ€™t provide a way to enforce a schema on the data provided, at least not provided out of the box by the database vendors. A workaround is to enforce the schema structure in your application code.

I happened to be working on a hobby project and needed to store a JSON value in a field and really needed an elegant solution to enforcing a schema structure. I am using Django of course. Django provides the JSONField column type which can be used to store and marshal JSON data form the database. I will describe my exact solution using a concrete example.

I am building a racing event management project. In this setup, I have the concept of a Series and Event. A series is made up of several events. When an event is finished, each event has a Result. An event can optionally belong to a Series. It is much easier if I define the structure in domain code (Not django models).

class Result:
    driver_id: int
    position: int

class Event:
    id: int
    series_id: int
    results: List[Result]

class Series:
    id: int
    events: List[Events]

The above code can be translated into Django ORM models as follows:


class Series(models.Model):
      title = CharField(...)


class Event:
      title = CharField(...)
      series = ForeignKey(Series, null=True, related_name="events")
      results = JSONField(null=True)

## Saving a simple Event.
result = [{"driver_id": 1, "position": 2}, {"fruit": "Apple"}]

Event.objects.create(title="Event 1", result=result) # saves successfully

The above code for saving an event demonstrates that the Event.result JSONField will accept any valid JSON payload which defeats the purpose of the result field. So the burden of consistently lays with the application. However, we know developers can be forgetful and may not think of validation and just drop any valid JSON payload in causing us pain down the road.

Django being a batteries included framework provides facilities to validate data in model fields. Django Validators is a function or callable that takes a value, performs any validation checks and raises a ValidationError if the data doesnโ€™t meet a set criteria. Django provides a good number of validators you can use on your models except a way to validate JSON data. However, it does allow you to create custom validators and this is where the magic unfolds.

In order to achieve the validation goal, we need a library that is able to validate JSON data. Typically what you need is a JSON schema validation library and the one I turn to is jsonschema.

Jsonschema allows you to define your schema using Json Schema Notations. I describe the JSON schema required to validate the structure of results in the Events model.

SCHEDULE_RESULT_SCHEMA = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "driver_id": {"type": "integer"},
            "name": {"type": "string"},

            // make sure that the value of postion should be positive non-zero integer.
            "position": {"type": "integer", "minimum": 1},

            // validate the format of best_lap time. Example: 1.42.234
            "best_lap": {
                "type": "string",
                "pattern": "^[1-9][0-9]?\\.[0-5][0-9]\\.[0-9]{3}$",  # Format: M.SS.mmm
            },
            "penalties": {"type": "integer", "minimum": 0},
            "points": {"type": "integer", "minimum": 0},
        },
        "required": [
            "driver_id",
            "name",
            "position",
            "penalties",
            "points",
        ],
        "additionalProperties": False,
    },
}

Introducing JSON schema and explaining all the notation is beyond the scope of this short post, there are endless resources online to learn about JSON schema. We shift gears and move over to adding this check.

import jsonschema
from functools import partial

def validate_json_field(schema, value):
	"""
    Validates a value against a given JSON schema.

    Args:
        value (any): The value to validate.
        schema (dict): The JSON schema definition.

    Returns:
        bool: True if the value is valid against the schema, False otherwise.
    """
    try:
        jsonschema.validate(instance=value, schema=schema)
    except jsonschema.ValidationError as e:
        raise ValidationError("JSON validation error: %s" % e.message)


validate_event_results = partial(validate_value_with_schema, schema=SCHEDULE_RESULT_SCHEMA)

class Event(models.Model):
      title = CharField(...)
      series = ForeignKey(Series, null=True, related_name="events")
      results = JSONField(null=True, validators=[validate_event_results])

One thing that may be confusing to the uninitiated is the use of partials. Typically when you define a function, to call the function, you need to pass in all the provided parameters for it to be valid. Python partials allows you to provide a few of the parameters and leave the rest to be supplied by the calling code. The return value of a partial is a function itself. The Django validation library only takes one parameter validator(value) but we need a function that also accepts the schema we validating against.

Without the partials, I could write the functions using two approaches. Firstly, make a dedicated function just for this use case. Secondly, a function that takes a single parameter and calls validate_json_field to do the job. I illustrate that below:

Option 1:

def validate_json_field(schema, value):
    try:
        jsonschema.validate(instance=value, schema=schema)
    except jsonschema.ValidationError as e:
        raise ValidationError("JSON validation error: %s" % e.message)

def validate_event_result(value: List[Dict]):
    # raises a ValidationError if it fails.
    validate_json_field(SCHEDULE_RESULT_SCHEMA, value)


class Event(models.Model):
    title = CharField(...)
    series = ForeignKey(Series, null=True, related_name="events")
    results = JSONField(null=True, validators=[validate_event_result])

Option 2

def validate_event_result(value):
    try:
        jsonschema.validate(instance=value, schema=SCHEDULE_RESULT_SCHEMA)
    except jsonschema.ValidationError as e:
        raise ValidationError("JSON validation error: %s" % e.message)


class Event(models.Model):
    title = CharField(...)
    series = ForeignKey(Series, null=True, related_name="events")
    results = JSONField(null=True, validators=[validate_event_result])

The two options presented still work as good code and maybe a lot easier to understand. However, they are too specific to the use case. If I were to repeat this JSON field validation exercise many times through out the code, we would have duplicated and writing a lot of redundant code, hence my preference for partials. With some formatting I can make it easy on the eyes with partials.


class Event(models.Model):
      title = CharField(...)
      series = ForeignKey(Series, null=True, related_name="events")
      results = JSONField(null=True,
		      validators=[
			      partial(validate_value_with_schema, schema=SCHEDULE_RESULT_SCHEMA),
			      # other validations can go here.
			     ])

To conclude, JSON schema validation is an effective way to ensure the integrity of data stored in Django’s JSONFields. By combining Django validators with the jsonschema library, you can elegantly enforce schema requirements directly within your application. Leveraging techniques like Python’s functools.partial not only simplifies the validation process but also enhances code reusability and maintainability. Ultimately, this approach empowers developers to implement robust and flexible validation mechanisms tailored to their specific needs.