Understanding and working with JSON data types with PostgreSQL
There was a time when if you had to work with a lot of JSON structured data, you had to turn to building NoSQL databases like MongoDB purposefully. However, the relational database world has caught on and given us the option to work with relational and non-relational data within the same database engine.
In my day-to-day work as a software engineer, I work with a lot of JSON-structured data. A couple of use cases for storing JSON data include:
- API response from third-party service providers.
- Storing metadata about an entity.
- Creating Read View models of data from relational tables
- Representing event data in a system built using Event Driven Architecture.
Purpose of this article?
Recently at my place of employment, I have had to query the database directly (in SQL) with JSON-based data. Most of my day-to-day, I work with JSON data using the Python programming language, which provides a lot of utilities for working with JSON structured data. This requires fetching data using Django ORM and doing the manipulation and processing in memory in Python.
Though I was able to fetch the data I needed, I realised at the database level, I am completely ignorant of how to work with JSON data types. So this article is my way of learning, documenting and also sharing my learnings from this research.
JSON data type in PostgreSQL
Though I am focusing on PostgreSQL in this writeup, some of the features are similar in most relational databases like MySQL for working with JSON data types.
Postgres offers two types of JSON data types - json
and jsonb
. jsonb
stands for JSON Binary. Both JSON data types are almost similar but for a few differences around efficiency and preservation of the exact copy of the input data. The json
data type, stores an exact copy of the input text, which requires functions processing the data to reparse the json string on execution. jsonb
on the other hand, converts the input text into a binary format, does not preserve json key order and strips out white spaces.
Differences between json
and jsonb
data types
# | json | jsonb |
---|---|---|
1 | Stores an exact copy of the input text | Converts input text into binary format and does not preserve key order. |
2 | Fast to input | Slower to input due to the conversion overhead. |
3 | Must slower to process due to the requirement for processing functions to reparse the data | Faster for processing since no reparsing is required. |
4 | Does not support indexing | Supports indexing. |
5 | Preserves duplicate keys. | Does not allow duplicate keys, only the last key is kept. |
From the PostgreSQL documentation, it is recommended that applications should prefer to store json data in jsonb
and only use json
for specialised needs such as legacy systems that may care about key ordering.
In general, most applications should prefer to store JSON data as
jsonb
, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys. - PostgreSQL: Documentation: 16: 8.14. JSON Types
Constraints and limits of json operations in PostgreSQL
The RFC7159 specification for JSON specifies JSON text to be UTF8 encoded, however, the JSON type can’t conform rigidly to the specification unless the database encoding is in UTF8. Any attempt to include UTF8 characters not supported by the database will result in a failure.
Functions that take json
type as input parameters, allow Unicode escape sequences denoted by \uXXXX
and only check for syntactic correctness regardless of the database encoding. However, functions that take jsonb
input types, are much stricter and disallow Unicode escapes not supported by PostgreSQL.
PostgreSQL does not provide schema validation tools to enforce JSON data structure. This responsibility will have to be handled by your application.
When converting textual JSON data into jsonb
types, the JSON primitive datatypes are effectively mapped onto the native types provided by PostgreSQL types. However, some constraints like numbers outside the range of PostgreSQL numeric types will not be allowed and such operations will fail. However, json
type doesn’t care and will pretty much store such numbers.
The table below shows the mapping between the JSON spec and PostgreSQL native types.
JSON primitive type | PostgreSQL type | Notes |
---|---|---|
string | text | \u0000 is disallowed, as are Unicode escapes representing characters not available in the database encoding |
number | numeric | NaN and infinity values are disallowed |
boolean | boolean | Only lowercase true and false spellings are accepted |
null | (none) | SQL NULL is a different concept |
Source: https://www.postgresql.org/docs/current/datatype-json.html
Working with JSON data
You can convert values to json using the operations::json
and::jsonb
for the respective types.
# Working with scala or primitive values
select '10'::json. # you can call jsonb as -> select '10'::jsonb
select 'true'::json
select "Dada"::json # This raises an error, not valid JSON
select '0.12e-5'::json # output - 0.12e-5
select '0.12e-5'::jsonb # output - 0.0000012
# JSON Array
select '[1, 2, "Apples", null]'::json
# Objects
SELECT '{"name": "Peter", "age": 8, "active": false, "height": 1.80}'::json;
Searching content of JSON data
As earlier stated, jsonb
is the place to be for storing JSON text that require processing by the database. Below are a couple of method for searching the structure of json text. Operators used to test the structure of a JSON data are - Containment (@>
) and Existence (?
) operators.
Containment operator @>
# Check that the scalar value is identical
select '"foo"'::jsonb @> '"foo"'::jsonb # output: true
# Check that the array on the right is a subset (contained) in the array on the left
select '[1,3,4,5,10]'::jsonb @> '[1,3]'::jsonb;
# Duplicate array items does not matter with jsonb
select '[1,3,4,5,10]'::jsonb @> '[1,3, 3]'::jsonb;
# Doesn't match nested value. Considered not contained.
SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb; -- yields false
# The above can be fixed with
SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb;
The contain object should match the containing object as to structure and data. Forgetting this fundamental rule, could lead to a lot of pain and missed results in your query.
Existence operator (?)
The existence operator (?
) test whether a given string is contained within the top level (object key) or is an element of an array.
# Searching for b in the array
SELECT '["a", "b", "c", "d", "e"]'::jsonb ? 'b';
# Search if the key "foo" exists in object key
select '{"foo": 1, "bar": "baroo"}'::jsonb ? 'foo';
# Values of keys in objects are not searched. So false.
SELECT '{"foo": "bar"}'::jsonb ? 'bar'; -- yields false
# Search on primitives
select '"foo"'::jsonb ? 'foo'
Concluding
There is a lot to cover on JSON data type and I have decided to split the article into multiple write ups. I hope to write about the operators that can be applied on jsonb
types. I also would like to write about the how to access keys of JSON objects, the different json processing functions provided by PostgreSQL.
References
- PostgreSQL Documentation - https://www.postgresql.org/docs/current/datatype-json.html