Shell: jq and yq examples

Tags: #jq #yq #shell #bash #json

jq examples.md

💡 Documentation with interactive search for functions (very useful)

The Primeagen Video

Basic Examples
Advanced Examples

Basic Examples

Here’s some basic usage examples…

$ cat /tmp/example.json | jq
{
  "foo": "bar",
  "errors": [
    "beep"
  ]
}
{
  "foo": null,
  "errors": [
    "boop"
  ]
}
{
  "foo": {
    "a": 1,
    "b": 2
  },
  "errors": [
    "beep"
  ]
}
{
  "foo": {
    "a": 3,
    "b": 4
  },
  "errors": [
    "beep",
    "boop"
  ]
}

$ jq < /tmp/example.json
{
  "foo": "bar",
  "errors": [
    "beep"
  ]
}
{
  "foo": null,
  "errors": [
    "boop"
  ]
}
{
  "foo": {
    "a": 1,
    "b": 2
  },
  "errors": [
    "beep"
  ]
}
{
  "foo": {
    "a": 3,
    "b": 4
  },
  "errors": [
    "beep",
    "boop"
  ]
}

$ jq .foo < /tmp/example.json
"bar"
null
{
  "a": 1,
  "b": 2
}
{
  "a": 3,
  "b": 4
}

$ jq keys < /tmp/example.json
[
  "errors",
  "foo"
]
[
  "errors",
  "foo"
]
[
  "errors",
  "foo"
]
[
  "errors",
  "foo"
]

$ jq '{renamed: .foo}' < /tmp/example.json
{
  "renamed": "bar"
}
{
  "renamed": null
}
{
  "renamed": {
    "a": 1,
    "b": 2
  }
}
{
  "renamed": {
    "a": 3,
    "b": 4
  }
}

$ jq 'select(.foo != null) | {renamed: .foo}' < /tmp/example.json
{
  "renamed": "bar"
}
{
  "renamed": {
    "a": 1,
    "b": 2
  }
}
{
  "renamed": {
    "a": 3,
    "b": 4
  }
}

$ jq 'select(.foo != null and (.foo | type == "object") and .foo.a > 1) | {renamed: .foo}' < /tmp/example.json
{
  "renamed": {
    "a": 3,
    "b": 4
  }
}

$ jq 'select(.foo != null and (.foo | type == "object") and .foo.a > 1 and (.errors | length > 1 and any(.[]; contains("boop")))) | {renamed: .foo, errs: .errors}' < /tmp/example.json 
{
  "renamed": {
    "a": 3,
    "b": 4
  },
  "errs": [
    "beep",
    "boop"
  ]
}

When opened in Neovim:

:%!jq
:%!jq -c
:'<,'>!jq
:%!jq -c 'select(.errors | length > 1)'

Advanced Examples

More advanced jq examples…

Extract fields from nd-json stream using interpolation syntax
Extract object from a list that is itself assigned to an object
Download a release binary from GitHub
Combine multiple objects into one
Complex transforming of nested data with .jq script
Adding new key to an object
List all unique keys recursively
List all array items who have different values across specified keys
List all array items who have a timeout larger than a set value
List all array items who have nested object key not set
Unique entries by key

yq examples…

List all array items who have different values across specified keys

Extract fields from nd-json stream using interpolation syntax

With the following nd-json stream:

{
  "result": {
    "agent": "foo",
    "count": "80",
  }
}
{
  "result": {
    "agent": "bar",
    "count": "123",
  }
}

You can produce the following output:

"client: foo, count: 80"
"client: bar, count: 123"

By executing the following command:

cat data.json | jq '.result | "client: \(.agent), count: \(.count)"'

The .result accesses the relevant field on each object in the stream.

We use | to pipe the data to the next ‘script’ which defines the output string we want.

The \() is interpolation syntax (you pass in the field, e.g. .count). Interpolation must be used inside a string.

NOTE: Make sure the interpolation syntax is quoted! See below JSON example that would break otherwise…

# i.e. JSON strings need to be quoted hence "\(<FIELD>)"

# GOOD
fastly kv-store list --json | jq '.Data.[] | {"name": "\(.Name)", "store_id": "\(.StoreID)"}'

# BROKEN
fastly kv-store list --json | jq '.Data.[] | {"name": \(.Name), "store_id": \(.StoreID)}'

Extract object from a list that is itself assigned to an object

Here is the example JSON

{
  "commands": [
    {
      "name": "...",
      ...
    },
  ]
}

We want to get the object that has a "name" field set with a value of "compute". This means we need to first get the top-level "commands" field, then dip inside its assigned list and find the relevant object.

go run cmd/fastly/main.go help --format json | jq '.commands[] | select(.name | contains("compute"))'

NOTE: an alternative to contains would be == operator: fastly service list --json | jq 'map(select(.Name == "testing-tf-provider-reactivation-bug"))'

Download a release binary from GitHub

$ curl -s https://api.github.com/repos/hashicorp/terraform-plugin-docs/releases/latest | \
	jq '.assets[] | select(.name | test("darwin")) | .browser_download_url' | \
    xargs -I {} curl -sLo /tmp/tfplugindocs.zip {} | \
    cd /tmp | \
    unzip tfplugindocs.zip tfplugindocs | \
    chmod +x ./tfplugindocs | \
    ./tfplugindocs -h

Below is an alternative approach I found via https://smarterco.de/download-latest-version-from-github-with-curl/

DOWNLOAD_URL=$(curl -s https://api.github.com/repos/felixb/swamp/releases/latest \
        | grep browser_download_url \
        | grep swamp_amd64 \
        | cut -d '"' -f 4)
curl -s -L --create-dirs -o ~/downloadDir "$DOWNLOAD_URL"

Combine multiple objects into one

function configure_rig_environment {
  # The rig platform sets up the environment variables from the config.yml
  # before even the scripts/prebuild (as part of a docker image build) is
  # triggered. Because of the config.yml interpolation with other yaml files,
  # it means we need to reset the CONFIG environment variable for testing.

  # first we need to convert the config.yml into json
  python -c 'import sys, yaml, json; json.dump(yaml.load(sys.stdin), sys.stdout, indent=4)' < /app/config.yml > /tmp/config.json

  # we extract the location/overrides (which doesn't change between environments).
  cat /tmp/config.json | jq .default.config.locations > /tmp/locations.json
  cat /tmp/config.json | jq .default.config.overrides > /tmp/overrides.json

  # then we store off the current config (this is missing the
  # locations/overrides) as they're only merged in after the image was built.
  echo $CONFIG > /tmp/config.json

  # next we combine the three configs back into one
  # and we generate a new environment.json file for it
  # -c is for generating compact json and not pretty-printed json
  jq -c -s '.[0] * {"locations": .[1]} * {"overrides": .[2]}' /tmp/config.json /tmp/locations.json /tmp/overrides.json > /tmp/environment.json

  # finally, we re-export the config with the new combined values
  # shellcheck disable=SC2155
  export CONFIG=$(cat /tmp/environment.json)
}

Complex transforming of nested data with .jq script

We have the following YAML, and I want all API paths and HTTP methods that contain a x-fastly-preprocess-exclude that matches a string I’m searching for (e.g. only filter the data when I’m searching for fastly-ruby)…

paths:
  /content/edge_check:
    get:
      summary: Check status of content in each POP's cache
      description: Retrieve headers and MD5 hash of the content for a particular URL from each Fastly edge server. This API is limited to 200 requests per hour.
      operationId: content-check
      parameters:
        - name: url
          in: query
          description: Full URL (host and path) to check on all nodes. if protocol is omitted, http will be assumed.
          style: form
          explode: true
          schema:
            type: string
            example: https://www.example.com/foo/bar
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: "#/components/schemas/content"
              examples:
                body:
                  value:
                    $ref: "examples/content-edge-check.yaml"
      x-fastly-preprocess-exclude:
        - fastly-ruby # fastly-ruby cannot handle a property named 'hash'

We convert it to JSON and pipe that JSON data to a jq script…

#!/bin/bash

__dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

CLIENT=$1
ISSUES="\nThe $CLIENT API client currently does not support the following endpoints:\n\n"
HAS_ISSUES=false
DEFAULT_URL="https://developer.fastly.com/reference/api/"

# Create temporary file that will eventually contain all unsupported endpoints.
tmp="$(mktemp)"

for schema in ./api-documentation/src/*.yaml; do
  # NOTE: We convert each YAML schema into JSON and process it with jq.
  json=$(ruby -rjson -ryaml -e "puts YAML.load_file('$schema', permitted_classes: [Time]).to_json")

  # We need the API endpoint so we can link to it in the API client's README.
  url=$(echo $json | jq -r '.externalDocs.url')
  if [[ "$url" == null ]]; then
    url="$DEFAULT_URL"
  fi

  exclude_all=$(echo $json | jq --arg client "$CLIENT" -r '. | if has("x-fastly-preprocess-exclude") then (if (.["x-fastly-preprocess-exclude"] | contains([$client])) or (.["x-fastly-preprocess-exclude"] | contains(["api-client"])) then true else false end) else false end')
  EXCLUDE_ALL="$exclude_all"

  result=$(echo $json | jq -f "$__dir/transmogrify.jq" --arg exclude_all "$EXCLUDE_ALL" --arg client "$CLIENT" --arg url "$url")

  if [[ "$result" != "" ]]; then
    HAS_ISSUES=true

    # NOTE:
    # We get the 'raw' value (jq -r) to avoid having "" around the output string.
    # The reason for storing the results into a tmp file is so we can sort the endpoints.
    echo $result | jq -r >> $tmp
    cat $tmp | sort -o $tmp
  fi
done

# NOTE:
# The Awk command is used to replace line breaks with the literal \n.
# Otherwise Awk will later error with the message "newline in string".
ISSUES="${ISSUES}$(cat $tmp | awk '{ printf "%s\\n", $0 }')"

if [[ "$HAS_ISSUES" == true ]]; then
  # NOTE: I've used Awk as (when installed) it's consistent across OS' unlike sed.
  awk -v issues="$ISSUES" '1;/## Issues/{ print issues; }' "$CLIENT/README.md"
fi

# Cleanup.
rm -rf tmp

Here’s our jq transmorgrify.jq script file…

.paths | with_entries(
    # If we're excluding all endpoints, then all we need to do is to grab the
    # keys and convert them into a comma-separated string.
    #
    # Otherwise, we need to find only those endpoints that have an
    # x-fastly-preprocess-exclude containing the API client we're looking for.
    #
    # .key is the API endpoint path (e.g. /service/{service_id}/version/{version_id}/acl)
    # .value is an object containing the supported HTTP methods (e.g. {"get": {...metadata...}, "post": {...metadata...}})
    if $exclude_all == "true" then
      .value |= (
        . | keys | map(select(. != "parameters")) | join(", ")
      )
    else
      # We modify the .value to replace the metadata associated with each HTTP
      # method with a list containing the API client we're searching for inside
      # of the x-fastly-preprocess-exclude field.
      #
      # e.g. {"/some/path": {"get": ["fastly-rust"], "post": ["fastly-rust"]}}
      .value |= (
          . | with_entries(
              # .key is the HTTP method (e.g. "get", "post" etc)
              # .value is the metadata object (e.g. {"summary": "...", "x-fastly-preprocess-exclude": [...]})
              #
              # We modify the .value from an object of metadata to a list
              # containing only the excluded API client we're searching for
              # inside of the x-fastly-preprocess-exclude field.
              #
              # e.g. {"get": ["fastly-rust"]} or {"get": ["api-client"]}
              .value |= (
                  if (type=="object" and has("x-fastly-preprocess-exclude")) then
                    .["x-fastly-preprocess-exclude"]
                  else
                    []
                  end
              )
              # We make sure to filter out the API clients that aren't relevant
              # to our search. We also search for the generic "api-client",
              # which means exclude ALL clients.
              | select((.value | contains([$client])) or (.value | contains(["api-client"])))
          )
      )
      # For each HTTP method object, keep it, if it's not empty.
      | select(.value != {})
      # We modify the .value from an object like:
      # {"get": ["fastly-rust"], "post": ["fastly-rust"]}
      #
      # To a comma-separated string like:
      # "get, post".
      #
      # We do this by first getting the keys as an array ["get", "post"], then
      # using join() to turn it into a string.
      #
      # e.g. {"/some/path": "get, post"}
      | .value |= (. | keys | map(select(. != "parameters")) | join(", "))
    end
)
# We only keep objects (e.g. {"/some/path": "get, post"}) that aren't empty.
| select(. != {})
# Finally, we convert the object (e.g. {"/some/path": "get, post"}) into a
# string that is formated so it can be inserted into a Markdown file.
#
# e.g. "- /some/path (get, post)"
#
# The trick is to modify the object key to be the final string, then get all the
# keys as an array, and join the array with a newline.
#
# I use `with_entries` to modify the key to also contain its value (e.g. the key
# becomes "- /some/path (GET, POST)").
#
# You'll notice I use `ascii_upcase` to capitalize the HTTP methods.
# Then I create an array from the modified keys and join those by a new line.
#
# Additionally I wrap the path in a Markdown link and set the URL using the $url
# variable that is passed into the jq script via jq's --arg flag.
| with_entries(.value as $v | .key |= "- [" + . + "](" + $url +") (" + ($v | ascii_upcase) + ")") | keys | join("\n")

# EXAMPLE OUTPUT:
#
# - [/user-groups/{user_group_id}](...) (PATCH)
# - [/user-groups/{user_group_id}/roles](...) (DELETE, POST)
# - [/user-groups/{user_group_id}/service-groups](...) (DELETE, POST)
# - [/user-groups/{user_group_id}/members](...) (DELETE, POST)
#
# DOCUMENTATION REFERENCE:
#
# with_entries:
# converts object to list of key/value objects, modifies the data, then converts back to an object.
# https://stedolan.github.io/jq/manual/#to_entries,from_entries,with_entries
#
# select:
# returns input value if it matches the given condition.
# https://stedolan.github.io/jq/manual/#select(boolean_expression)
#
# keys:
# takes object and returns the keys in an array.
# https://stedolan.github.io/jq/manual/#keys,keys_unsorted
#
# |= is a "modification assignment operator" which means it assigns the new value after processing.

Adding new key to an object

You can use the += with an object {...}

# .paths == {"/api/path": {"parameters": [...], "get": {...}, "post": {...}}}
.paths | with_entries(
  # .key == /api/path
  # .value == {parameters: [...], get: {...}, post: {...}}
  .value |= (
    . | with_entries(
      # .key == 'parameters' and all support HTTP methods (e.g. 'get', 'post').
      # .value == the values assigned to the keys, e.g. HTTP methods have {"summary": "", ...etc}.
      .value |= (
        if (type=="object" and has("summary")) then
          .summary
        elif (type=="array") then # i.e. "parameters" is an array
          . | map(."$ref" | split("/") | last)
        else
          "No summary"
        end
      )
    ) | . += {"documentation": $api_url}
  )
)

List all unique keys recursively

Yaml file:

- metadata:
    foo: some_string
    bar: 123
    baz:
      - an
      - array
    qux:
      an: object
      with: more_keys

Command:

yq -o=json example.yaml | jq '[.[] | select(.metadata != null) | .metadata | paths | join(".")] | unique | sort'

List all array items who have different values across specified keys

Yaml file:

- host: foo
  tls_server_name: foo
- host: bar
  tls_server_name: ... # this has a different value to the host key
- host: baz
  tls_server_name: baz

Command:

yq e '.[] | select(has("tls_server_name") and .tls_server_name != .host) | {"tls_server_name": .tls_server_name, "host": .host}' resources/checkers.yaml

Output:

host: bar
tls_server_name: ...

List all array items who have a timeout larger than a set value

Yaml file:

- name: foo
  timeout: '10s'
- name: bar
  timeout: '5s'
- name: baz
  timeout: '20s'

Command:

# converts yaml to json so we use jq
cat resources/checkers.yaml | yq eval -o=json | jq '.[] | select(.timeout != null and (.timeout | sub("s$"; "") | tonumber) > 10) | {name: .name, timeout: .timeout}'

# just uses yq but notice we MUST quote the json key names
# also yq will continue to output yaml even though we define a json output!
yq '.[] | select(.timeout != null and (.timeout | sub("s$"; "") | tonumber) > 10) | {"name": .name, "timeout": .timeout}' resources/checkers.yaml

Output:

{
  "name": "nic.in",
  "timeout": "30s"
}
{
  "name": "nic.gdn",
  "timeout": "15s"
}
{
  "name": "zdns",
  "timeout": "20s"
}

name: nic.in
timeout: 30s

name: nic.gdn
timeout: 15s

name: zdns
timeout: 20s

List all array items who have nested object key not set

- name: foo
  metadata:
    logins:
      console: # KEY IS SET HERE
        url: 123
- name: bar
  metadata:
    logins:
      ote: # KEY IS NOT SET HERE
        url: 123

yq eval -o=json '.' example.yaml | jq -r 'map(select(.metadata.logins.console == null) | .name)'

Unique entries by key

We have to wrap our data in an array '[.[] | select(.tls_certificate != null)]' and then we can use unique_by:

$ yq -o=json example.yaml | jq '[.[] | select(.tls_certificate != null)] | unique_by(.tls_certificate)[] | {tls_cert: .tls_certificate}'

{
  "tls_cert": "$EXAMPLE_1"
}
{
  "tls_cert": "$EXAMPLE_2"
}