Ndifreke Ekott

Thoughts, stories, ideas and programming

22 Aug 2024

Understanding and Working With Sql/Json Language With Postgresql

https://res.cloudinary.com/ndy40/image/upload/v1724335728/_1d962af0-ec4a-44b1-9e39-fb8b620fd595_laf8pm.jpg

I must say, I have enjoyed my journey of understanding PostgreSQL offerings for JSON data type. I am also realising how much ORMs do for you; there can be pros to using one and cons simultaneously depending on your personality. There is however a sense of satisfaction in understanding how the database we depend on for business works.

This may be the last part of my JSON datatype journey and to recap, here are the previous articles I wrote covering other aspects of JSON types:

  1. Understanding and working with JSON data types with PostgreSQL
  2. Understanding operations on JSON Data types with PostgreSQL

For this post, I will be focusing on the JSON path language. This is useful for navigating complex json data you may have in a table column. I will also showcase a few json_path querying functions for extracting and writing test expressions on json data.

SQL/JSON Path

A couple of things to get off the ground in understanding the path language.

Symbol/Operator Description Example
Dot . Used to access members of an object $.a , $.person.name
Square Brackets []
$ A variable representing the JSON value being queried (context item)
$varname Named variable used in parameterised JSON expressions using together with json processing functions.
@ Variable representing the result of a path evaluation. '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)'Β  - Test if we have any value in the array greater than 2. @ holds the result of $.a[*] which is the array [1,2,3,4,5]

Now we have listed out the basic symbols you will need for future json path expressions, let’s look at the different ways to access data within an object.

Accessor Operator Description

| .Key

."$varname" | Returns the member at the specified key. If the name of the key has a $ sign or uses symbols that fail Javascript identifier rules, then enclose in a quote ."$age" . | | .* | Returns all values located a the top level of the current object. | | .** | Recursively return values by navigating the different levels in an object. This is an interesting one and an example will demonstrate its output. | | .**{level} | This operates like the above accessor operator .** with the addition of a {level} used to tell the engine how deep the recursion should go. | | .**{start_level to end_level} | Similar to the above, except now you specify a start and end level of recursion. | | [*] | Wildcard for fetching all the elements in an array. | | [subscript, ...] | Access elements in an array by index. Where subscript is the index of the array starting from 0. You can also get multiple values by specifying multiple subscripts. Example $.segments[1,2,3] will fetch items from index 1,2 and 3. |

Examples

Let us start with a basic JSON structure we will be working with for most of the examples here.

{
  "track": {
    "segments": [
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 73
      },
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 72
      },
      {
        "location":   [ 47.706, 13.2635 ],
        "start time": "2018-10-14 10:39:21",
        "HR": 135
      }
    ]
  }
}

We will be making use of the function json_path_query(target jsonb, path jsonpath,...) .

Task 1: Access the segments property of the track

select jsonb_path_query('{
  "track": {
    "segments": [
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 73
      },
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 72
      },
      {
        "location":   [ 47.706, 13.2635 ],
        "start time": "2018-10-14 10:39:21",
        "HR": 135
      }
    ]
  }
}'::jsonb, '$.track.segments')

## Output
1. [
	{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"},
	{"HR": 72, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"},
	{"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}
]

Task 2: Fetch the first and third segments of the track.

select jsonb_path_query('{
  "track": {
    "segments": [
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 73
      },
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 72
      },
      {
        "location":   [ 47.706, 13.2635 ],
        "start time": "2018-10-14 10:39:21",
        "HR": 135
      }
    ]
  }
}'::jsonb, '$.track.segments[0, 2]')

## Output
1. "{""HR"": 73, ""location"": [47.763, 13.4034], ""start time"": ""2018-10-14 10:05:14""}"
2. "{""HR"": 135, ""location"": [47.706, 13.2635], ""start time"": ""2018-10-14 10:39:21""}"

Task 3: Fetch all location coordinate from locations within the segments.

select jsonb_path_query('{
  "track": {
    "segments": [
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 73
      },
      {
        "location":   [ 47.763, 13.4000 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 72
      },
      {
        "location":   [ 47.706, 13.2635 ],
        "start time": "2018-10-14 10:39:21",
        "HR": 135
      }
    ]
  }
}'::jsonb, '$.**{2}.location')

## Output
1. "[47.763, 13.4034]"
2. "[47.763, 13.4000]"
3. "[47.706, 13.2635]"

Task 4: Fetch the HR from the segment with HR value greater than 72.

select jsonb_path_query('{
  "track": {
    "segments": [
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 73
      },
      {
        "location":   [ 47.763, 13.4000 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 72
      },
      {
        "location":   [ 47.706, 13.2635 ],
        "start time": "2018-10-14 10:39:21",
        "HR": 135
      }
    ]
  }
}'::jsonb, '$.**{2}.HR ? (@ > 72)')
## Output
1. 73
2. 135

Task 4: What is the size of the segments?

select jsonb_path_query('{
  "track": {
    "segments": [
      {
        "location":   [ 47.763, 13.4034 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 73
      },
      {
        "location":   [ 47.763, 13.4000 ],
        "start time": "2018-10-14 10:05:14",
        "HR": 72
      },
      {
        "location":   [ 47.706, 13.2635 ],
        "start time": "2018-10-14 10:39:21",
        "HR": 135
      }
    ]
  }
}'::jsonb, '$.track.segments.size()')

## Output
1. 3

Conclusion

The JSON path language in PostgreSQL offers powerful capabilities for querying and manipulating JSON data:

  • Flexible data access: It allows accessing nested elements using dot notation (.) and array indexing ([]), making it easy to navigate complex JSON structures.
  • Wildcards and recursion: Operators like . *and .** enable retrieving all values at a specific level or recursively through an object.
  • Filtering and conditions: The language supports filtering results based on conditions, as demonstrated in the example querying HR values greater than 72.
  • Array operations: It provides ways to access specific array elements or all elements using wildcards.
  • Built-in functions: Functions like size() allow for operations on JSON data, such as determining the length of an array.

These features make the JSON path language a versatile tool for working with JSON data in PostgreSQL, enabling complex queries and data manipulation without the need for multiple nested functions or complex SQL statements.

References

PostgreSQL Documentation - https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH