2

I am reading a JSON object from Salesforce. The object is irregular in the sense that some nested arrays are empty and some are not. How to deal with this in tidyjson?

I am setting up an API with Salesforce in R. The objective is to get meaningful data out of Salesforce to process in R.

json <- '
{
  "totalSize": [
    355710
  ],
  "done": [
    false
  ],
  "nextRecordsUrl": [
    "/services/data/v45.0/query/01gc000001L8zdkAAB-749"
  ],
  "records": [
    {
      "attributes": {
        "type": "Order_Line__c",
        "url": "/services/data/v45.0/sobjects/Order_Line__c/a0T1N000009aZ9lUAE"
      },
      "Id": "a0T1N000009aZ9lUAE",
      "Name": "OrderLine-1099369",
      "SO_Number_Formula__c": "548402-2.3",
      "Ship_From_Inventory__c": "XXX",
      "RMA_Number__c": "548402",
      "Part_Number__c": "01t1N00000JNeAQQA1",
      "Marketing_Part__c": "XXXXXXXXXXX",
      "Family__c": "XXXXXXXX",
      "Serial_Numbers__r": {
        "records": {}
      }
    },
    {
      "attributes": {
        "type": "Order_Line__c",
        "url": "/services/data/v45.0/sobjects/Order_Line__c/a0T1N000009aZ9mUAE"
      },
      "Id": "a0T1N000009aZ9mUAE",
      "Name": "OrderLine-1099370",
      "SO_Number_Formula__c": "962816-1.1",
      "Ship_From_Inventory__c": "XXX",
      "RMA_Number__c": "962816",
      "Part_Number__c": "01t1N00000JNc3qQAD",
      "Marketing_Part__c": "XXXXXXXXXX",
      "Family__c": "XXXXXXX",
      "RMA_Received_Date__c": "2019-02-18",
      "Serial_Numbers__r": {
        "totalSize": 1,
        "done": true,
        "records": [
          {
            "attributes": {
              "type": "Serial_Number__c",
              "url": "/services/data/v45.0/sobjects/Serial_Number__c/a0X1N00000NoyAjUAJ"
            },
            "Id": "a0X1N00000NoyAjUAJ",
            "Name": "SN217426",
            "Legacy_Line_Id__c": "962816SN217426",
            "Customer_Name__c": "XXXXXX",
            "Original_Shipment_Date__c": "2018-06-26",
            "Disposition__c": "Pending",
            "Status__c": "FailureVerification"
          }
        ]
      }
    }
  ]
}
'

mydata <- json %>% 
    as.tbl_json %>%
    enter_object("records") %>%
    gather_array() %>%
    spread_values(
      Id = jstring("Id"),
      Name = jstring("Name"),
      SO_Number_Formula = jstring("SO_Number_Formula__c"),
      Ship_From_Inventory = jstring("Ship_From_Inventory__c"),
      RMA_Number = jstring("RMA_Number__c"),
      Part_Number = jstring("Part_Number__c"),
      Marketing_Part = jstring("Marketing_Part__c"),
      Family = jstring("Family__c")) %>%
    enter_object("Serial_Numbers__r") %>%
    enter_object("records") %>%
    gather_ %>%
      spread_values(
    Id = jstring("Id"))

The irregularity is in the [records][Serial_Numbers__r][records]. In this example, the first occurrence is empty {} and the second occurrence is not empty. The code generates the following error when executing gather_keys or gather _array: Error in gather_keys(.) : 1 records are values not objects Error in gather_array(.) : 1 records are values not arrays

I am thinking that this is caused by the empty array [records]. There is plenty of such irregularity in the Salesforce output: some records have detailed nested data, some do not. How can I deal with this?

1 Answer 1

1

This is a fantastic question, and is something that we should really have a cleaner way to handle. enter_object() proves problematic in these types of cases where you lose records based on irregular JSON practices.

I submitted an issue to track improvements here: https://github.com/colearendt/tidyjson/issues/121

In the meantime, the way I typically do this is by splitting records based on the characteristic that delineates them. In this case, you can use gather_object() on the parent object to get the same effect as enter_object(), and then use filter / bind_rows to treat the rows differently.

Ideally bind_rows() would work better in the pipe here... that is something I would like to see as an improvement for dplyr (Issue here)! I'm curious to see if this solves your problem! (Also, keep spread_all() in mind to simplify some of the column specifying at a cost of some "guessing" on the package's part!).

  json <- '{
  "totalSize": [
    355710
  ],
  "done": [
    false
  ],
  "nextRecordsUrl": [
    "/services/data/v45.0/query/01gc000001L8zdkAAB-749"
  ],
  "records": [
    {
      "attributes": {
        "type": "Order_Line__c",
        "url": "/services/data/v45.0/sobjects/Order_Line__c/a0T1N000009aZ9lUAE"
      },
      "Id": "a0T1N000009aZ9lUAE",
      "Name": "OrderLine-1099369",
      "SO_Number_Formula__c": "548402-2.3",
      "Ship_From_Inventory__c": "XXX",
      "RMA_Number__c": "548402",
      "Part_Number__c": "01t1N00000JNeAQQA1",
      "Marketing_Part__c": "XXXXXXXXXXX",
      "Family__c": "XXXXXXXX",
      "Serial_Numbers__r": {
        "records": {}
      }
    },
    {
      "attributes": {
        "type": "Order_Line__c",
        "url": "/services/data/v45.0/sobjects/Order_Line__c/a0T1N000009aZ9mUAE"
      },
      "Id": "a0T1N000009aZ9mUAE",
      "Name": "OrderLine-1099370",
      "SO_Number_Formula__c": "962816-1.1",
      "Ship_From_Inventory__c": "XXX",
      "RMA_Number__c": "962816",
      "Part_Number__c": "01t1N00000JNc3qQAD",
      "Marketing_Part__c": "XXXXXXXXXX",
      "Family__c": "XXXXXXX",
      "RMA_Received_Date__c": "2019-02-18",
      "Serial_Numbers__r": {
        "totalSize": 1,
        "done": true,
        "records": [
          {
            "attributes": {
              "type": "Serial_Number__c",
              "url": "/services/data/v45.0/sobjects/Serial_Number__c/a0X1N00000NoyAjUAJ"
            },
            "Id": "a0X1N00000NoyAjUAJ",
            "Name": "SN217426",
            "Legacy_Line_Id__c": "962816SN217426",
            "Customer_Name__c": "XXXXXX",
            "Original_Shipment_Date__c": "2018-06-26",
            "Disposition__c": "Pending",
            "Status__c": "FailureVerification"
          }
        ]
      }
    }
  ]
}
'

  library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  library(tidyr)
  library(tidyjson)
#> 
#> Attaching package: 'tidyjson'
#> The following object is masked from 'package:dplyr':
#> 
#>     bind_rows
#> The following object is masked from 'package:stats':
#> 
#>     filter

  prep_data <- json %>%
    as.tbl_json %>%
    enter_object("records") %>%
    gather_array() %>%
    spread_values(
      Id = jstring("Id"),
      Name = jstring("Name"),
      SO_Number_Formula = jstring("SO_Number_Formula__c"),
      Ship_From_Inventory = jstring("Ship_From_Inventory__c"),
      RMA_Number = jstring("RMA_Number__c"),
      Part_Number = jstring("Part_Number__c"),
      Marketing_Part = jstring("Marketing_Part__c"),
      Family = jstring("Family__c")) %>%
    enter_object("Serial_Numbers__r")

  # show that types are different
  prep_data %>%
    gather_object("key") %>%
    json_types() %>%
    select(key, type) %>%
    filter(key == "records")
#> # A tbl_json: 2 x 2 tibble with a "JSON" attribute
#>   `attr(., "JSON")`      key     type  
#>   <chr>                  <chr>   <fct> 
#> 1 "{}"                   records object
#> 2 "[{\"attributes\":..." records array

  # handle
  taller <- prep_data %>%
    gather_object("key") %>%
    json_types("type") %>%
    filter(key == "records")

  final <- tidyjson::bind_rows(
    taller %>% filter(type == "object"),
    taller %>% filter(type == "array") %>%
      gather_array("record_row") %>%
      spread_values(
        RecordId = jstring("Id")
      )
  )

  final %>% select(key, type, record_row, RecordId)
#> # A tbl_json: 2 x 4 tibble with a "JSON" attribute
#>   `attr(., "JSON")`      key     type   record_row RecordId          
#>   <chr>                  <chr>   <fct>       <int> <chr>             
#> 1 "{}"                   records object         NA <NA>              
#> 2 "{\"attributes\":{..." records array           1 a0X1N00000NoyAjUAJ

Created on 2020-03-15 by the reprex package (v0.3.0)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Cole for your insightful comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.