Json – bash: Iterating over members of a JSON array selected by index

bashjqjsonlinuxshell

I'm using jq to parse a JSON file, extracting each JSON array in a series into a shell array.

My current code looks like the following:

for ((i = 0; i < ${#nvars[@]}; i++)); do
    v1=($(cat $INPUT | jq '."config"[i]."var1"[]'))
    echo $v1
done

error message:

error: i is not defined

I also replaced

v1=($(cat $INPUT | jq '."config"[i]."var1"[]'))

with

v1=($(cat $INPUT | jq '."config"[$i]."var1"[]'))

still not working. Any idea? Any help is appreciated!


Edit: Sample Input Data

{
    "config-vars":[
        {
            "var1":["v1","v2"],
            "var2":""
        },
        {
            "var1":["v3",""],
            "var2":"v4"
        }
    ]
}

Best Answer

There's a fair bit of room for improvement. Let's start here:

v1=($(cat $INPUT | jq '."config"[$i]."var1"[]'))

...first, you don't actually need to use cat; it's slowing your performance, because it forces jq to read from a pipe rather than from your input file directly. Just running jq <"$INPUT" would be more robust (or, better, <"$input", to avoid using all-uppercase names, which are reserved by convention for shell builtins and environment variables).

Second, you need to quote all variable expansions, including the expansion of the input file's name -- otherwise, you'll get bugs whenever your filename contains spaces.

Third, array=( $(stuff) ) splits the output of stuff on all characters in IFS, and expands the results of that splitting as a series of glob expressions (so if the output contains *.txt, and you're running this script in a directory that contains text files, you get the names of those files in your result array). Splitting on newlines only would mean you could correctly parse multi-word strings, and disabling glob expansion is necessary before you can use this technique reliably in the presence of glob characters. One way to do this is to set IFS=$'\n' and run set -h before running this command; another is to redirect the output of your command into a while read loop (shown below).

Fourth, string substitution into code is bad practice in any language -- that way lies (local equivalents to) Bobby Tables, allowing someone who's supposed to be able to only change the data passed into your process to provide content which is processed as executable code (albeit, in this case, as a jq script, which is less dangerous than arbitrary code execution in a more full-featured language; still, this can allow extra data to be added to the output).

Next, once you're getting jq to emit newline-separated content, you don't need to read it into an array at all: You can iterate over the content as it's written from jq and read into your shell, thus preventing the shell from needing to allocate memory to buffer that content:

while IFS= read -r; do
  echo "read content from jq: $REPLY"
done < <(jq -r --arg i "$i" '.config[$i | tonumber].var1[]' <"$input")

Finally -- let's say you do want to work with an array. There are two ways to do this that avoid pitfalls. One is to set IFS explicitly and disable glob expansion before the assignment:

IFS=$'\n' # split only on newlines
set -f
result=( $(jq -r ... <"$input") )

The other is to assign to your array with a loop:

result=( )
while IFS= read -r; do
  result+=( "$REPLY" )
done < <(jq -r ... <"$input")

...or, as suggested by @JohnKugelman, to use read -a to read the whole array in one operation:

IFS=$'\n' read -r -d '' -a result < <(jq -r ... <"$input")