4

I'd like to use jq to do a left join on these two objects in an array by using the jq JOIN operator:

[
  {
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "Ben",
        "male"
      ],
      [
        "2",
        "Cathy",
        "female"
      ]
    ],
    "columns": [
      "id",
      "name",
      "sex"
    ]
  },
  {
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "25"
      ],
      [
        "2",
        "28"
      ]
    ],
    "columns": [
      "id",
      "age"
    ]
  }
]

to this:

{
    "type": "spreadsheet",
    "rows": [
      [
        "1",
        "Ben",
        "male",
        "25"
      ],
      [
        "2",
        "Cathy",
        "female",
        "28"
      ]
    ],
    "columns": [
      "id",
      "name",
      "sex",
      "age"
    ]
}

The jq manual references SQL-style operators like INDEX and JOIN but I can't find any examples online of how to use them. Like any join, I need to iterate over the rows in the first object and merge with rows found in the second object based on the id column.

3
  • Possible duplicate of Analogue for sql join in jq Commented Jun 30, 2017 at 23:50
  • 1
    I've edited my question to clarify why it's not a duplicate: the other question's answers make no mention of jq's SQL-style operators. Commented Jun 30, 2017 at 23:54
  • 1
    Noting that SQL-style were implemented at this commit, which shows the equivalent native construction. A few lines down from that, you'll see a test case demonstrating the use. Commented Jun 30, 2017 at 23:59

1 Answer 1

4

The following adopts an object-oriented approach to the task, and ignores keys other than "rows" and "columns" in the input.

There are accordingly three steps:

  1. convert the tabular (columns/rows) representation of the "database" to an OO representation -- this is accomplished by toOO;
  2. perform the update -- accomplished by left_join;
  3. convert the OO representation back to the tabular form (toDB)

The main program is thus quite short:

map(toOO) | left_join(.id) | toDB   

The helper functions, most of which are independently useful, are presented below but should appear before the main program above.

Helper Functions

# headers should be an array of strings
def objectify(headers):
  . as $in
  | reduce range(0; headers|length) as $i ({};
      . + {(headers[$i]): $in[$i]} );

def fromObject:
  . as $in
  | reduce keys_unsorted[] as $k ( {rows:[], columns:[]};
     .columns += [$k] | .rows += [$in[$k]] );

# Input: an array of object
# Output: a single object with array-valued keys formed from
# the input object by (blindly) concatenating the values
def merge:
  def allkeys: map(keys) | add | unique;
  allkeys as $allkeys
  | reduce .[] as $in ({};
     reduce $allkeys[] as $k (.;
      . + {($k): (.[$k] + [$in[$k]]) } ));

# id should be a filter such as .id
# Apply merge but do not replicate id
def merge_by(id):
  merge | (id |= .[0] ) ;

# Create an object-representation of the columns/rows database
def toOO:
  .columns as $headers
  | [.rows[] | objectify($headers)];

def toDB:
  map(fromObject)
  | merge_by(.columns);

# Input: an array of two arrays, each consisting of objects forming a database.
# "key" specifies the (possibly composite) key to be used to form the
# left-join of the two databases. 
# That is, objects are NOT added to the first database.
def left_join(key):
  def updateObject(obj):
    reduce .[] as $x ([];
      if ($x|key) == (obj|key) 
      then . + [$x + obj] else . + [$x]
      end);
  reduce .[1][] as $x (.[0]; updateObject($x) );

Output

{
  "columns": [
    "id",
    "name",
    "sex",
    "age"
  ],
  "rows": [
    [
      "1",
      "Ben",
      "male",
      "25"
    ],
    [
      "2",
      "Cathy",
      "female",
      "28"
    ]
  ]
}

Using JOIN builtin

If your jq has JOIN/4, then it can be used to implement left_join/1 as follows:

def left_join(key):
  map(INDEX(key)) 
  | [ JOIN( .[1]; .[0][]; key; add) ] ;

In theory, this should be more efficient than the definition given above.

Sign up to request clarification or add additional context in comments.

2 Comments

Not sure what you mean by "If your jq has JOIN/4". What does the /4 mean, and how can I tell if my jq has it?
The number is the arity, and you can tell in a number of ways: checking the jq documentation corresponding to the version of jq you are using; checking the output of builtins (available in jq 1.5 and later); or attempting to use it, e.g. by running jq -n 'JOIN(0;0;0;0)'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.