0

I am new to R and now facing a problem parsing a json column in a dataset, I HAVE GONE THROUGH pretty much all the threads about parsing json, but I CANNOT find a proper solution...as I believe my problem is a little different:

Here is my situation:

I am using R to connect to a SQL database via ODBC && get a table I need:

enter image description here

The TCbigdata is the target json column and the json looks like :

{
"memberid": "30325292",
"hotelgroup": {
    "g_orders": "-1",
    "g_sfristcreatedate": "-1",
    "g_lastcreatedate": "-1",
    "g_slastcreatedate": "-1",
    "g_fristcreatedate": "-1"
},
"visa": {
    "v_orders": "-1",
    "v_maxcountryid": "-1",
    "v_lastsorderdate": "-1",
    "v_maxvisaperson": "-1",
    "v_lastorderdate": "-1",
    "v_lastvisacountryid": "-1",
    "v_sorders": "-1"
},
"callcentertel": {
    "lastcctzzycalldate": "-1",
    "ishavecctcomplaintcall": "-1",
    "lastcctchujingcalldate": "-1",
    "lastcctyouluncalldate": "-1"
}....(key n, key n+1.. etc)..}

** My desire output would be all the nested vars , if possible, I want to DELETE memberid && hotelgroup && visa && callcentertel && etc such group keys , so,

1. parsing columns would be like " g_orders...v_orders..lastcct....etc" in one dataset without keys such as "hotelgroup","visa","callcentertel" ...etc...;

2. Or, parsing it into multiple datasets like "hotelgroup" table, COLUMN--"g_orders"+ "g_sfristcreatedate"..... "visa" table, COLUMN--"v_orders"+ "v_maxcountryid".....

I am not sure if there is a package for problem like this?

============ PROBLEM DESCRIPTION && DESIRE OUTPUT =================

I have searched several demonstrations using jsonlite/rjsonio/tidyjson , but failed to find a properway.

**Another part I find confusing is, my dataset, which is from data warehouse via ODBC, return "factor" type of "TCbigdata", instead of "Character" as I assume:

enter image description here

as what it is in DW:

enter image description here

================ MY CODE...TBC ========================

HERE IS MY CODE:

# SQL TABLE  orgtc <- sqlQuery(channel1,'SELECT idMemberInfo,memberid, refbizid, crttime, TCbigdata  FROM tcbiz_fq_rcs_data.MemberInfo ') 
# Convert var_type   orgjf$JFMemberPortrait<- as.character( orgjf$JFMemberPortrait )    
# ?????  ----library(jsonlite)   l <- fromJSON(orgjf$JFMemberPortrait, simplifyDataFrame = FALSE) ---- TBD

I appreciate your help!

1 Answer 1

1

Interesting question. There are really two pieces:

  • getting the JSON out of the DW
  • parsing the JSON into your desired output

It looks like you have made decent progress getting the JSON out of the DW. I'm not sure what you are using to connect, but I would recommend using the new-ish odbc package, which has a nice DBI interface.

(Remember that reproducible examples are important to getting help quickly). Once you have the data out of the DW, you should have something like the data_frame that I manufacture below.

Further, if you want to use tidyjson (my preference), then you need to be aware that it is off of CRAN and the dev version at jeremystan/tidyjson has useful functionality (and is broken by the new dplyr). Here, I use the dev version from my repo:


suppressPackageStartupMessages(library(tidyverse))                                                                    
# devtools::install_github("colearendt/tidyjson")                                                                     
suppressPackageStartupMessages(library(tidyjson))                                                                     
raw_json <- '{                                                                                                        
"memberid": "30325292",                                                                                               
"hotelgroup": {                                                                                                       
"g_orders": "-1",                                                                                                     
"g_sfristcreatedate": "-1",                                                                                           
"g_lastcreatedate": "-1",                                                                                             
"g_slastcreatedate": "-1",                                                                                            
"g_fristcreatedate": "-1"                                                                                             
},                                                                                                                    
"visa": {                                                                                                             
"v_orders": "-1",                                                                                                     
"v_maxcountryid": "-1",                                                                                               
"v_lastsorderdate": "-1",                                                                                             
"v_maxvisaperson": "-1",                                                                                              
"v_lastorderdate": "-1",                                                                                              
"v_lastvisacountryid": "-1",                                                                                          
"v_sorders": "-1"                                                                                                     
},                                                                                                                    
"callcentertel": {                                                                                                    
"lastcctzzycalldate": "-1",                                                                                           
"ishavecctcomplaintcall": "-1",                                                                                       
"lastcctchujingcalldate": "-1",                                                                                       
"lastcctyouluncalldate": "-1"                                                                                         
}                                                                                                                     
}'                                                                                                                    

dw_data <- data_frame(                                                                                                
idMemberInfo = c(1:10)                                                                                                
, TCbigdata = as.character(lapply(c(1:10),function(x){return(raw_json)}))                                             
)                                                                                                                     

dw_data                                                                                                               
#> # A tibble: 10 x 2
#>    idMemberInfo TCbigdata                                                 
#>           <int> <chr>                                                     
#>  1            1 "{                                                       …
#>  2            2 "{                                                       …
#>  3            3 "{                                                       …
#>  4            4 "{                                                       …
#>  5            5 "{                                                       …
#>  6            6 "{                                                       …
#>  7            7 "{                                                       …
#>  8            8 "{                                                       …
#>  9            9 "{                                                       …
#> 10           10 "{                                                       …

# convert to tbl_json                                                                                                 
dw_json <- as.tbl_json(dw_data, json.column = "TCbigdata")                                                            

# option 1 - let tidyjson do the work for you                                                                         
# - you will need to rename                                                                                           
opt_1 <- dw_json %>% spread_all()                                                                                     
names(opt_1)                                                                                                          
#>  [1] "idMemberInfo"                        
#>  [2] "memberid"                            
#>  [3] "hotelgroup.g_orders"                 
#>  [4] "hotelgroup.g_sfristcreatedate"       
#>  [5] "hotelgroup.g_lastcreatedate"         
#>  [6] "hotelgroup.g_slastcreatedate"        
#>  [7] "hotelgroup.g_fristcreatedate"        
#>  [8] "visa.v_orders"                       
#>  [9] "visa.v_maxcountryid"                 
#> [10] "visa.v_lastsorderdate"               
#> [11] "visa.v_maxvisaperson"                
#> [12] "visa.v_lastorderdate"                
#> [13] "visa.v_lastvisacountryid"            
#> [14] "visa.v_sorders"                      
#> [15] "callcentertel.lastcctzzycalldate"    
#> [16] "callcentertel.ishavecctcomplaintcall"
#> [17] "callcentertel.lastcctchujingcalldate"
#> [18] "callcentertel.lastcctyouluncalldate"

# for instance... as long as there are no conflicts                                                                   
rename_function <- function(x){                                                                                       
x[str_detect(x,"\\.")] <- str_sub(x[str_detect(x,"\\.")],str_locate(x[str_detect(x,"\\.")],"\\.")[,"start"]+1)
return(x)                                                                                                             
}                                                                                                                     
opt_1 %>%                                                                                                             
rename_all(.funs=list(rename_function)) %>%                                                                           
names()                                                                                                               
#>  [1] "idMemberInfo"           "memberid"              
#>  [3] "g_orders"               "g_sfristcreatedate"    
#>  [5] "g_lastcreatedate"       "g_slastcreatedate"     
#>  [7] "g_fristcreatedate"      "v_orders"              
#>  [9] "v_maxcountryid"         "v_lastsorderdate"      
#> [11] "v_maxvisaperson"        "v_lastorderdate"       
#> [13] "v_lastvisacountryid"    "v_sorders"             
#> [15] "lastcctzzycalldate"     "ishavecctcomplaintcall"
#> [17] "lastcctchujingcalldate" "lastcctyouluncalldate"

# option 2 - define what you want                                                                                     
# - more typing up front                                                                                              
opt_2 <- dw_json %>% spread_values(                                                                                   
g_orders = jstring(hotelgroup,g_orders)                                                                               
, g_sfristcreatedate = jstring(hotelgroup, g_sfristcreatedate)                                                        
#...                                                                                                                  
, lastcctzzycalldate = jstring(callcentertel, lastcctzzycalldate)                                                     
#...                                                                                                                  
)                                                                                                                     
names(opt_2)                                                                                                          
#> [1] "idMemberInfo"       "g_orders"           "g_sfristcreatedate"
#> [4] "lastcctzzycalldate"

Hope it helps! FWIW, I am hopeful of persisting the tidyjson-like behavior in the R community

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.