I want to append json data to dataframe by looking at whether it does have common key or not, if yes, I want to do value update. So far, I used jsonlite::fromJSON() to get R list from raw json data, can add json data as new row there is no match with dataframe. I am wondering how am I gonna add value from json data if it has common key with dataframe. Any way to do this task in R?
reproducible example
here is the output of jsonlite::fromJSON(jsondata):
mylist = list(version = 0L, subject = "Subject 102", organization = "HOSP",
time = "2018-02-10T09:30:00Z", features = list(age = 81L,
gender = "M", il6 = list(value = 200L, time = "2018-02-10T09:30:00Z"),
pct = list(value = 1234L, time = "2018-02-10T09:30:00Z"),
crp = list(value = 1004L, time = "2018-02-10T09:30:00Z")))
here is dataframe:
mydf = structure(list(subject= c('Subject 100', 'Subject 101','Subject 102','Subject 103'), taken_time = c(15L, 5L, 39L, -21L), il6= c(203.2893857,
4.858269406, 4.858269406, 14220), pct = c(732.074484, 25.67901235,
119.0221384, 120.0477168), crp = c(196115424.7, 1073624.455,
4092823.885, 1457496.474), age = c(52L, 74L, 81L, 67L), gender = structure(c(2L,
2L, 2L, 1L), .Label = c("F", "M"), class = "factor"), inpatient_readmission_time_rtd = c(79.78819444,
57.59068053, 57.59068053, 57.59068053), infection_flag = c(0L,
0L, 1L, 1L), temperature_value = c(98.9, 98.9, 98, 101.3), heartrate_value = c(106,
61, 78, 91), pH_result_time_rta = c(11, 85.50402145, 85.50402145,
85.50402145), gcst_value = c(15, 15, 15, 14.63769293)), row.names = c(NA,
4L), class = "data.frame")
current attempt:
based on @akrun suggestion, here is what I tried:
unlist(mylist) %>% setNames(str_remove(names(.), 'features\\.')) %>%
as.list %>% as_tibble %>% type.convert(as.is = TRUE) %>%
bind_rows(mydf, .) %>% select(-contains(c(".time", "version", "organization", "taken_time"))) %>% as.data.frame()
but I still can't able to add value in json data to dataframe by finding common key such as person_id, crp, pct, il6, then I just want to do value update. How can I do that easily in R? any idea?
I think using dplyr can do this manipulation, but I don't know how to get going from above attempt. any idea?
desired output:
this is the desired output that I want to get:
structure(list(person_id = structure(1:4, .Label = c("Subject 100",
"Subject 101", "Subject 102", "Subject 103"), class = "factor"),
taken_time = c(15L, 5L, 39L, -21L), il6 = c(203.2893857,
4.858269406, 200, 14220), pct = c(732.074484, 25.67901235,
1234, 120.0477168), crp = c(196115424.7, 1073624.455, 1004,
1457496.474), age = c(52L, 74L, 81L, 67L), gender = structure(c(2L,
2L, 2L, 1L), .Label = c("F", "M"), class = "factor"), inpatient_readmission_time_rtd = c(79.78819444,
57.59068053, 57.59068053, 57.59068053), infection_flag = c(0L,
0L, 1L, 1L), temperature_value = c(98.9, 98.9, 98, 101.3),
heartrate_value = c(106L, 61L, 78L, 91L), pH_result_time_rta = c(11,
85.50402145, 85.50402145, 85.50402145), gcst_value = c(15,
15, 15, 14.63769293)), class = "data.frame", row.names = c(NA,
-4L))
how can I get above desired dataframe? Is there any way to achieve this by using dplyr utilities? thanks
update: R sessioninfo:
here is the output of R sessioninfo:
> > sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build
> 18362)
>
> Matrix products: default
>
> Random number generation: RNG: Mersenne-Twister Normal:
> Inversion Sample: Rounding locale: [1] LC_COLLATE=English_United
> States.1252 LC_CTYPE=English_United States.1252 [3]
> LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages: [1] stats graphics grDevices utils
> datasets methods base
>
> other attached packages: [1] stringr_1.4.0 ranger_0.12.1
> dplyr_0.8.5 caret_6.0-86 ggplot2_3.3.0 lattice_0.20-38 [7]
> jsonlite_1.6.1
>
> loaded via a namespace (and not attached): [1] Rcpp_1.0.3
> pillar_1.4.3 compiler_3.6.0 gower_0.2.1
> plyr_1.8.6 [6] iterators_1.0.12 class_7.3-15
> tools_3.6.0 rpart_4.1-15 ipred_0.9-9 [11]
> lubridate_1.7.4 lifecycle_0.2.0 tibble_2.1.3
> nlme_3.1-139 gtable_0.3.0 [16] pkgconfig_2.0.3
> rlang_0.4.5 Matrix_1.2-17 foreach_1.5.0
> rstudioapi_0.11 [21] prodlim_2019.11.13 withr_2.1.2
> pROC_1.16.2 generics_0.0.2 recipes_0.1.10 [26]
> stats4_3.6.0 nnet_7.3-12 grid_3.6.0
> tidyselect_1.0.0 data.table_1.12.8 [31] glue_1.3.2
> R6_2.4.1 survival_2.44-1.1 lava_1.6.7
> reshape2_1.4.3 [36] purrr_0.3.3 magrittr_1.5
> ModelMetrics_1.2.2.2 splines_3.6.0 scales_1.1.0 [41]
> codetools_0.2-16 MASS_7.3-51.4 assertthat_0.2.1
> timeDate_3043.102 colorspace_1.4-1 [46] stringi_1.4.6
> munsell_0.5.0 crayon_1.3.4
sessionInfo()R 3.6.2and dplyr is the devel versionlibrary(data.table); un1 <- unlist(mylist);names(un1) <- str_remove(names(un1), 'features\\.');rbindlist(list(mydf, type.convert(as.list(un1))), fill = TRUE)