3

I am new to this website and came here because I am really struggling with a problem of extracting information from a JSON file. The tricky part is that there are variable number of fields, so I can't get away with simple syntax.

Here's a sample code:

{
  "addresses": {
    "@count": "1",
    "address_name": {
      "address_spec": {
        "@addr_no": "1",
        "full_address": "Tel Aviv Univ, Eitan Berglas Sch Econ, IL-69978 Tel Aviv, Israel",
        "organizations": {
          "@count": "2",
          "organization": [
            "Tel Aviv Univ",
            {
              "@pref": "Y",
              "#text": "Tel Aviv University"
            }
          ]
        },
        "suborganizations": {
          "@count": "1",
          "suborganization": "Eitan Berglas Sch Econ"
        },
        "city": "Tel Aviv",
        "country": "Israel",
        "zip": {
          "@location": "BC",
          "#text": "IL-69978"
        }
      }
    }
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "1",
    "address_name": {
      "address_spec": {
        "@addr_no": "1",
        "full_address": "MIT, Cambridge, MA 02139 USA",
        "organizations": {
          "@count": "2",
          "organization": [
            "MIT",
            {
              "@pref": "Y",
              "#text": "Massachusetts Institute of Technology (MIT)"
            }
          ]
        },
        "city": "Cambridge",
        "state": "MA",
        "country": "USA",
        "zip": {
          "@location": "AP",
          "#text": "02139"
        }
      }
    }
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "2",
    "address_name": [
      {
        "address_spec": {
          "@addr_no": "1",
          "full_address": "Univ Kentucky, Lexington, KY 40506 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "Univ Kentucky",
              {
                "@pref": "Y",
                "#text": "University of Kentucky"
              }
            ]
          },
          "city": "Lexington",
          "state": "KY",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "40506"
          }
        }
      },
      {
        "address_spec": {
          "@addr_no": "2",
          "full_address": "Univ Bonn, ZEI, D-5300 Bonn, Germany",
          "organizations": {
            "@count": "2",
            "organization": [
              "Univ Bonn",
              {
                "@pref": "Y",
                "#text": "University of Bonn"
              }
            ]
          },
          "suborganizations": {
            "@count": "1",
            "suborganization": "ZEI"
          },
          "city": "Bonn",
          "country": "Germany",
          "zip": {
            "@location": "BC",
            "#text": "D-5300"
          }
        }
      }
    ]
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "1",
    "address_name": {
      "address_spec": {
        "@addr_no": "1",
        "full_address": "Harvard Univ, Cambridge, MA 02138 USA",
        "organizations": {
          "@count": "2",
          "organization": [
            "Harvard Univ",
            {
              "@pref": "Y",
              "#text": "Harvard University"
            }
          ]
        },
        "city": "Cambridge",
        "state": "MA",
        "country": "USA",
        "zip": {
          "@location": "AP",
          "#text": "02138"
        }
      }
    }
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "3",
    "address_name": [
      {
        "address_spec": {
          "@addr_no": "1",
          "full_address": "Columbia Univ, New York, NY 10027 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "Columbia Univ",
              {
                "@pref": "Y",
                "#text": "Columbia University"
              }
            ]
          },
          "city": "New York",
          "state": "NY",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "10027"
          }
        }
      },
      {
        "address_spec": {
          "@addr_no": "2",
          "full_address": "NYU, New York, NY USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "NYU",
              {
                "@pref": "Y",
                "#text": "New York University"
              }
            ]
          },
          "city": "New York",
          "state": "NY",
          "country": "USA"
        }
      },
      {
        "address_spec": {
          "@addr_no": "3",
          "full_address": "Univ Pompeu Fabra, Barcelona, Spain",
          "organizations": {
            "@count": "2",
            "organization": [
              "Univ Pompeu Fabra",
              {
                "@pref": "Y",
                "#text": "Pompeu Fabra University"
              }
            ]
          },
          "city": "Barcelona",
          "country": "Spain"
        }
      }
    ]
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "2",
    "address_name": [
      {
        "address_spec": {
          "@addr_no": "1",
          "full_address": "Univ Chicago, Chicago, IL 60637 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "Univ Chicago",
              {
                "@pref": "Y",
                "#text": "University of Chicago"
              }
            ]
          },
          "city": "Chicago",
          "state": "IL",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "60637"
          }
        }
      },
      {
        "address_spec": {
          "@addr_no": "2",
          "full_address": "Amer Bar Fdn, Chicago, IL 60611 USA",
          "organizations": {
            "@count": "1",
            "organization": "Amer Bar Fdn"
          },
          "city": "Chicago",
          "state": "IL",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "60611"
          }
        }
      }
    ]
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "2",
    "address_name": [
      {
        "address_spec": {
          "@addr_no": "1",
          "full_address": "Ohio State Univ, Columbus, OH 43210 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "Ohio State Univ",
              {
                "@pref": "Y",
                "#text": "Ohio State University"
              }
            ]
          },
          "city": "Columbus",
          "state": "OH",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "43210"
          }
        }
      },
      {
        "address_spec": {
          "@addr_no": "2",
          "full_address": "Harvard Univ, Cambridge, MA 02138 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "Harvard Univ",
              {
                "@pref": "Y",
                "#text": "Harvard University"
              }
            ]
          },
          "city": "Cambridge",
          "state": "MA",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "02138"
          }
        }
      }
    ]
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "1",
    "address_name": {
      "address_spec": {
        "@addr_no": "1",
        "full_address": "Univ Chicago, Chicago, IL 60637 USA",
        "organizations": {
          "@count": "2",
          "organization": [
            "Univ Chicago",
            {
              "@pref": "Y",
              "#text": "University of Chicago"
            }
          ]
        },
        "city": "Chicago",
        "state": "IL",
        "country": "USA",
        "zip": {
          "@location": "AP",
          "#text": "60637"
        }
      }
    }
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "2",
    "address_name": [
      {
        "address_spec": {
          "@addr_no": "1",
          "full_address": "Wissensch Zentrum Berlin Sozialforsch, D-1000 Berlin, Germany",
          "organizations": {
            "@count": "1",
            "organization": "Wissensch Zentrum Berlin Sozialforsch"
          },
          "city": "Berlin",
          "country": "Germany",
          "zip": {
            "@location": "BC",
            "#text": "D-1000"
          }
        }
      },
      {
        "address_spec": {
          "@addr_no": "2",
          "full_address": "Harvard Univ, Dept Govt, Cambridge, MA 02138 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "Harvard Univ",
              {
                "@pref": "Y",
                "#text": "Harvard University"
              }
            ]
          },
          "suborganizations": {
            "@count": "1",
            "suborganization": "Dept Govt"
          },
          "city": "Cambridge",
          "state": "MA",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "02138"
          }
        }
      }
    ]
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}
{
  "addresses": {
    "@count": "2",
    "address_name": [
      {
        "address_spec": {
          "@addr_no": "1",
          "full_address": "NYU, CV Starr Ctr Appl Econ, New York, NY 10003 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "NYU",
              {
                "@pref": "Y",
                "#text": "New York University"
              }
            ]
          },
          "suborganizations": {
            "@count": "1",
            "suborganization": "CV Starr Ctr Appl Econ"
          },
          "city": "New York",
          "state": "NY",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "10003"
          }
        }
      },
      {
        "address_spec": {
          "@addr_no": "2",
          "full_address": "Princeton Univ, Princeton, NJ 08544 USA",
          "organizations": {
            "@count": "2",
            "organization": [
              "Princeton Univ",
              {
                "@pref": "Y",
                "#text": "Princeton University"
              }
            ]
          },
          "city": "Princeton",
          "state": "NJ",
          "country": "USA",
          "zip": {
            "@location": "AP",
            "#text": "08544"
          }
        }
      }
    ]
  },
  "category_info": {
    "headings": {
      "@count": "1",
      "heading": "Social Sciences"
    },
    "subjects": {
      "@count": "3",
      "subject": [
        {
          "@ascatype": "traditional",
          "#text": "Economics"
        },
        {
          "@ascatype": "extended",
          "#text": "Business & Economics"
        },
        {
          "@ascatype": "traditional",
          "#text": "ECONOMICS"
        }
      ]
    }
  }
}

What I was hoping to extract is a country for each of the records (some records have more than one country, which seems to be causing the problem). So my naive approach was to say:

.static_data."fullrecord_metadata".addresses.address_name.country

This however gives me several errors (null has no keys, and cannot index array with string). Checking using the keys command:

.static_data."fullrecord_metadata".addresses.address_name | keys

I can see that it's seems there's a problem with the way the data is structured...

So, could you suggest if I can actually extract the list of countries for each entry using jq? Thank you!

2 Answers 2

3

For each input top-level JSON entity, the following filter will recursively examine all the objects to see if they have a "country" key, and it will then report the distinct "country" values for that top-level entity:

jq -c '[.. | if type == "object" and has("country") 
             then .country
             else empty end] | unique' 
["Israel"]
["USA"]
["Germany","USA"]
["USA"]
["Spain","USA"]
["USA"]
["USA"]
["USA"]
["Germany","USA"]
["USA"]

Here's a filter that will produce the same results in your example, though it is not exactly equivalent:

[.. | .country? // empty] | unique

[Exercise for the interested reader: what is the difference? :-) ]

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you VERY much! :) This is way too advanced for me to figure out on my own.
.country? // empty means that .country should be used if it exists and is not null, otherwise it will be substituted with empty. unique removes duplicate entries.
1

Here is a solution which uses a function to handle the variation in .address_name

 def address_specs:
    if type == "array" then .[].address_spec else .address_spec end
 ;

 .addresses | .address_name | [address_specs | .country] | unique

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.