3

I am attempting to select only specific fields from a JSON file and their full path (results comes from Elasticsearch).

My JSON file:

{
  "_index": "ships",
  "_type": "doc",
  "_id": "c36806c10a96a3968c07c6a222cfc818",
  "_score": 0.057158414,
  "_source": {
    "user_email": "[email protected]",
    "current_send_date": 1552557382,
    "next_send_date": 1570798063,
    "data_name": "atari",
    "statistics": {
      "game_mode": "engineer",
      "opened_game": 0,
      "user_score": 0,
      "space_1": {
        "ship_send_priority": 10,
        "ssl_required": "true",
        "ship_send_delay": 15,
        "user_score": 0,
        "template1": {
          "current_ship_status": "sent",
          "current_ship_date": "4324242",
          "checked_link_before_clicked": 0
        },
        "template2": {
          "current_ship_status": "sent",
          "current_ship_date": "4324242",
          "checked_payload": 0
        }
      }
    }
  }
}

I am transforming the keys to one liners:

<file jq -c 'paths(scalars) as $p | [$p, getpath($p)]'
[["_index"],"ships"]
[["_type"],"doc"]
[["_id"],"c36806c10a96a3968c07c6a222cfc818"]
[["_score"],0.057158414]
[["_source","user_email"],"[email protected]"]
[["_source","current_send_date"],1552557382]
[["_source","next_send_date"],1570798063]
[["_source","data_name"],"atari"]
[["_source","statistics","game_mode"],"engineer"]
[["_source","statistics","opened_game"],0]
[["_source","statistics","user_score"],0]
[["_source","statistics","space_1","ship_send_priority"],10]
[["_source","statistics","space_1","ssl_required"],"true"]
[["_source","statistics","space_1","ship_send_delay"],15]
[["_source","statistics","space_1","user_score"],0]
[["_source","statistics","space_1","template1","current_ship_status"],"sent"]
[["_source","statistics","space_1","template1","current_ship_date"],"4324242"]
[["_source","statistics","space_1","template1","checked_link_before_clicked"],0]
[["_source","statistics","space_1","template2","current_ship_status"],"sent"]
[["_source","statistics","space_1","template2","current_ship_date"],"4324242"]
[["_source","statistics","space_1","template2","checked_payload"],0]

Than I pipe the output to grep to extract all the fields I want:

<file jq -c 'paths(scalars) as $p | [$p, getpath($p)]'  | grep -e '"_index"\|current_send_date\|current_send_date\|ship_send_delay\|ship_send_priority\|current_ship_status'
[["_index"],"ships"]
[["_source","current_send_date"],1552557382]
[["_source","statistics","space_1","ship_send_priority"],10]
[["_source","statistics","space_1","ship_send_delay"],15]
[["_source","statistics","space_1","template1","current_ship_status"],"sent"]
[["_source","statistics","space_1","template2","current_ship_status"],"sent"]

At the end I am piping grep's output to sed and clean the characters I do not need that results whit what I want:

<file jq -c 'paths(scalars) as $p | [$p, getpath($p)]'  | grep -e '"_index"\|current_send_date\|current_send_date\|ship_send_delay\|ship_send_priority\|current_ship_status' | sed -e 's/\[\["//g' -e 's/","/./g' -e 's/"],"/=/g' -e 's/"],/=/g' -e 's/]$//g' -e 's/"$//g'

_index=ships
_source.current_send_date=1552557382
_source.statistics.space_1.ship_send_priority=10
_source.statistics.space_1.ship_send_delay=15
_source.statistics.space_1.template1.current_ship_status=sent
_source.statistics.space_1.template2.current_ship_status=sent

I am looking for a better way to at least extract the fields from jq not using grep. I can live with content preparation using SED, but I feel that there must be a better way to get the fields I want not using grep. I believe there must be some select(.mykey|.mykey1|.mykey2) that can accomplish that.

0

1 Answer 1

3

Use join and string interpolation (\(...)):

$ jq -r 'paths(scalars) as $p | "\($p|join("."))=\(getpath($p))"' file
_index=ships
_type=doc
_id=c36806c10a96a3968c07c6a222cfc818
_score=0.057158414
[email protected]
_source.current_send_date=1552557382
_source.next_send_date=1570798063
_source.data_name=atari
_source.statistics.game_mode=engineer
_source.statistics.opened_game=0
_source.statistics.user_score=0
_source.statistics.space_1.ship_send_priority=10
_source.statistics.space_1.ssl_required=true
_source.statistics.space_1.ship_send_delay=15
_source.statistics.space_1.user_score=0
_source.statistics.space_1.template1.current_ship_status=sent
_source.statistics.space_1.template1.current_ship_date=4324242
_source.statistics.space_1.template1.checked_link_before_clicked=0
_source.statistics.space_1.template2.current_ship_status=sent
_source.statistics.space_1.template2.current_ship_date=4324242
_source.statistics.space_1.template2.checked_payload=0

Actually you don't even need grep if you have the latest version of jq, try this:

(paths(scalars) | select(IN(.[];
    "_index",
    "current_send_data",
    "ship_send_delay",
    "ship_send_priority",
    "current_ship_status"
))) as $p | "\($p|join("."))=\(getpath($p))"
Sign up to request clarification or add additional context in comments.

1 Comment

Yet another advanced jq use I was not aware of. Thank you for sharing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.