3

I am trying to parse below json object using shell script:

country.json

 {
        "countries": [
            {"country":"India","city":["India1","India2","India3"]},
            {"country":"USA","city":["USA1","USA2","USA3"]}
           
           ]
    }

and my desired output should be like:

country:India, city:India1
country:India, city:India2
country:India, city:India3

country:USA, city:USA1
country:USA, city:USA2
country:USA, city:USA3 

I am using jq in shell script to iterate the above json like this:

for k in $(jq '.countries | keys | .[]' country.json); do
    countryObj=$(jq -r ".countries[$k]" country.json);
    countryValue=$(jq -r '.country' <<< "$countryObj");
    city_array=$(jq -r '.city' <<< "$countryObj");
    echo $city_array
done 

from this I am able to get city_array i.e. ["India1","India2","India3"] and ["USA1","USA2","USA3"] but I am not able to get desired output mentioned above

6 Answers 6

10

This can be done entirely in jq.

jq -r '
   .countries |
   map(
      .country as $country |
      .city | map("country: \( $country ), city: \( . )\n") | add
   ) |
   join("\n")
'

Gives:

country: India, city: India1
country: India, city: India2
country: India, city: India3

country: USA, city: USA1
country: USA, city: USA2
country: USA, city: USA3

jqplay


If you don't need that blank line, it's a lot simpler.

jq -r '
   .countries[] |
   .country as $country |
   .city[] |
   "country: \( $country ), city: \( . )"
'

jqplay


The later can be reduced to

jq -r '.countries[] | "country: \( .country ), city: \( .city[] )"'

jqplay

Sign up to request clarification or add additional context in comments.

Comments

2

Assuming the file is formatted as presented ...

One simple awk script based on the double quote as delimiter:

awk -F'"' '
$2=="country" { for (i=8; i<=NF; i+=2)
                    printf "country:%s, city:%s\n", $4, $i
              }
' country.json

This generates:

country:India, city:India1
country:India, city:India2
country:India, city:India3
country:USA, city:USA1
country:USA, city:USA2
country:USA, city:USA3

If you really want a blank line between sections of output:

awk -F'"' '
$2=="country" { printf "%s", pfx
                for (i=8; i<=NF; i+=2)
                    printf "country:%s, city:%s\n", $4, $i
                pfx="\n"
              }
' country.json

This generates:

country:India, city:India1          # no leading blank line
country:India, city:India2
country:India, city:India3
                                    # blank line only between sections
country:USA, city:USA1
country:USA, city:USA2
country:USA, city:USA3              # no trailing blank line

1 Comment

That "assuming the file is formatted as presented" is doing a lot of lifting. Might call for some emphasis that this is usable only to the extent that one can guarantee that the pretty-printing and key order will never change (and that this is rarely guaranteed in real-life scenarios).
1

If jq is not available, and shell script really is the only tool at hand, then I would suggest a two-phase strategy. In first phase, convert the JSON into some format which many shell tools can easily handle: that'll give us more choices for implementing the second phase. Concretely, in this case,

sed '/city/! d; s/[^[:alnum:]]/ /g' country.json \
| awk '{for (i=4;i<=NF;i++) print $1 ": " $2 ", " $3 ": " $i; print ""}'

first phase

sed '/city/! d; s/[^[:alnum:]]/ /g' country.json

Let's break that down:

  • /city/! d ---- Delete all lines without "city" in them.
  • s/[^[:alnum:]]/ /g ---- Replace every non-alphanumeric character with a space.

Result is

              country   India   city    India1   India2   India3
              country   USA   city    USA1   USA2   USA3

Great! Many shell tools can easily handle this format; among them,

  • The shell (Bash or ksh or zsh or ...) itself
  • cut, sort, join, comm, column
  • awk

second phase

OK, awk:

awk '{for (i=4;i<=NF;i++) print $1 ": " $2 ", " $3 ": " $i; print ""}'

Let's break that down:

  • for (i=4;i<=NF;i++) ---- For every field starting with the 4th one.
  • $1 ---- the 1st field
  • $2 ---- the 2nd field
  • $i ---- the i th field.



(Obviously, this particular solution won't work properly if a city name contains spaces; nor will it correctly handle many many other edge cases.)

7 Comments

Even if a modern OS doesn't ship with jq, it'll still ship with Python, which does have a proper parser. Falling all the way back to completely syntax-unaware tools like sed and awk -- with the attendant failure modes when content is still valid JSON but formatted in a way different than the answer's code expects -- is hardly called for.
As a concrete example of how fragile this code is, consider the case where the OP's file is pretty-printed with "city": [ on one line, then "India1", on the following line, "India2", on the line after that, &c -- that's a very common convention. (I've had the misfortune in the past of having had a customer who followed the practices you're demonstrating here -- it meant every time we revved our output format we had to worry about breaking their hand-rolled parser; trying to support them was an absolute nightmare nobody else should be put through)
(Similarly, think about what happens if the tooling generating the OP's output just reorders its keys, putting "city": [...] before "country": ... -- that's the kind of thing that's generally implementation-defined and not controlled by the developer at all, when sorting keys alphabetically is not explicitly enforced).
Good point. Please share a Python solution?
@markp-fuso, of course you are right: could be combined into a single awk script. I was trying to explain the overall reformat-and-then-manipulate approach, and thought that breaking into parts and showing the intermediate result would make for the clearest explanation.
|
1

If you don't have jq, but do have Python, that has an equally capable built-in JSON parser. A Python script that generates your desired output might look like:

#!/usr/bin/env python
import json, sys

content = json.loads(sys.stdin.read())
for country_dict in content['countries']:
  country_name = country_dict['country']
  for city in country_dict['city']:
    print(f'country: {country_name}, city: {city}')
  print()

To embed this inside a shell script:

python_script=$(cat <<'EOF'
import json, sys

content = json.loads(sys.stdin.read())
for country_dict in content['countries']:
  country_name = country_dict['country']
  for city in country_dict['city']:
    print(f'country: {country_name}, city: {city}')
  print()
EOF
)

python -c "$python_script" <country.json

Comments

1

Here's something I whipped up quickly to make the code as compact as possible.

For cmd.exe OR Bash:

jq -r ".countries[] | \"country:\" + .country + \", city:\" + (.city[])" country.json

For PowerShell (using the stop-parsing operator):

jq --% -r ".countries[] | \"country:\" + .country + \", city:\" + (.city[])" country.json

To get the output as requested by OP, do the following:

Create a file named filter.jq with the following contents:

.countries[] | (.country as $country | (.city[] | "country:" + $country + ", city:" + .) ), ""

Then run the command in PowerShell, CMD or Bash:

jq -r -f filter.jq country.json

3 Comments

Note: Those skip the requested blank line between the countries.
Cleaner and shorter: .countries[] | "country:\( .country ), city:\( .city[] )"
Note: This works in CMD, PowerShell, and Bash since jq is cross-platform. I've updated it to include an option for OP's requested output.
-2

Add a loop to iterate each city in $city_array for each $countryObj, then echo your desired output line-by-line.

for k in $(jq '.countries | keys | .[]' country.json); do
    countryObj=$(jq -r ".countries[$k]" country.json);
    countryValue=$(jq -r '.country' <<< "$countryObj");
    city_array=$(jq -r '.city' <<< "$countryObj");
    
    for city in $city_array; do
        echo "country:$countryValue, city:$city"
    done

    echo "" # Blank line between each country
done

1 Comment

for city in $city_array will not work properly when city names have spaces in them. "city_array" isn't an array at all in your code -- it's just a string; to make it an array, you might instead use readarray -t city_array < <(jq -r .city <<<"$countryObj"), and then for city in "${city_array[@]}"; do

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.