0

I am trying to update a table that has STRUCT(a few fields, ARRAY(STRUCT)). The field that I need to update is inside the array and I am having trouble with making it work.

Here is the layout of the the two tables:

CREATE TABLE mydatset.orders (
order_id  string,
order_time timestamp,
trans  STRUCT <
    id      string,
    amount      INT64,
    accounts      ARRAY<STRUCT <
          role      STRING ,
          account_id        STRING,
          region          STRING,
          amount      INT64> > >
)
CREATE TABLE mydatset.relocations (
    account_id    string,
    region      string
)

Trying to update the region of any account in the array accounts if that account exists in the relocations table:

update mydataset.orders a
set trans = (SELECT AS STRUCT   trans.* REPLACE(ARRAY(SELECT STRUCT<role STRING, account_id STRING, region STRING, amount INT64>
                          (cp.role,  cp.account_id,
                           case when cp.account_id = ll.account_id then ll.region else cp.region end ,
                           cp.amount
                          )
                    ) as accounts )
    from unnest(trans.accounts) cp
    left join unnest(relocs.chgs) ll
    on cp.account_id = ll.account_id
    )
from (select array_agg(struct (account_id, region) ) chgs
      from`mydataset.relocations`
      ) relocs
where true

The syntax works, but the sql doesn't perform the expected update. The account's region in the orders table is not changed after running the above update!

(I have seen BigQuery UPDATE nested array field and this case is slightly different. The array is inside a struct and itself is an array of struct)

Appreciate any help.

1 Answer 1

3

Below is for BigQuery Standard SQL

#standardSQL
UPDATE `project.dataset.orders`
SET trans = (SELECT AS STRUCT trans.* REPLACE(
  ARRAY(SELECT AS STRUCT x.* REPLACE(IFNULL(y.region, x.region) AS region)
    FROM UNNEST(trans.accounts) x
    LEFT JOIN UNNEST(relocations) y
    USING(account_id)
  ) AS accounts))
FROM (SELECT ARRAY_AGG(t) relocations FROM `project.dataset.relocations` t)
WHERE TRUE

It is tested with below dummy data

initial dummy data that looks like below

[
  {
    "order_id": "order_id1",
    "order_time": "2019-06-28 01:05:16.346854 UTC",
    "trans": {
      "id": "id1",
      "amount": "1",
      "accounts": [
        {
          "role": "role1",
          "account_id": "account_id1",
          "region": "region1",
          "amount": "11"
        },
        {
          "role": "role2",
          "account_id": "account_id2",
          "region": "region2",
          "amount": "12"
        }
      ]
    }
  },
  {
    "order_id": "order_id2",
    "order_time": "2019-06-28 01:05:16.346854 UTC",
    "trans": {
      "id": "id2",
      "amount": "1",
      "accounts": [
        {
          "role": "role3",
          "account_id": "account_id1",
          "region": "region4",
          "amount": "13"
        },
        {
          "role": "role4",
          "account_id": "account_id3",
          "region": "region3",
          "amount": "14"
        }
      ]
    }
  }
]

after applying below adjustments

[
  {
    "account_id": "account_id1",
    "region": "regionA"
  },
  {
    "account_id": "account_id2",
    "region": "regionB"
  }
]   

result is

[
  {
    "id": "id1",
    "amount": "1",
    "accounts": [
      {
        "role": "role1",
        "account_id": "account_id1",
        "region": "regionA",
        "amount": "11"
      },
      {
        "role": "role2",
        "account_id": "account_id2",
        "region": "regionB",
        "amount": "12"
      }
    ]
  },
  {
    "id": "id2",
    "amount": "1",
    "accounts": [
      {
        "role": "role3",
        "account_id": "account_id1",
        "region": "regionA",
        "amount": "13"
      },
      {
        "role": "role4",
        "account_id": "account_id3",
        "region": "region3",
        "amount": "14"
      }
    ]
  }
]
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your response. Strange, I get the following error when I try you query as is (only replacing the dataset name): Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN. (I think that was why originally I used array_agg for the data from the relocations table.)
argh. I was mostly focusing on adjusting your query assuming that it is correct at a core. let me check this again - will get back shortly
@FZF - see updated answer - it is tested now - so should definitely work for you :o)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.