I have tried probably hundreds of permutations of this code for literally days to try to get a function that will do what I want, and I have finally given up. It feels like it should definitely be doable and I am so close!
I have tried to get back to the nub of things here with my reprex below.
Basically I have a single-row dataframe, with a column containing a list of strings ("concepts"). I want to create an additional column for each of those strings, using mutate, ideally with the column taking its name from the string, and then to populate the column with the results of a function call (?it doesn't matter which function, for now? - I just need the infrastructure of the function to work.)
I feel, as usual, like I must be missing something obvious... maybe just a syntax error.
I also wonder if I need to use purrr::map, maybe a simpler vectorised mapping would work fine.
I feel like the fact that new columns are named ..1 rather than the concept name is a bit of a clue as to what is wrong.
I can create the data frame I want by calling each concept manually (see end of reprex) but since the list of concepts is different for different data frames, I want to functionalise this using pipes and tidyverse techniques rather than do it manually.
I've read the following questions to find help:
- How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs
- How to mutate multiple columns with dynamic variable using purrr:map function?
- (R) Cleaner way to use map() with list-columns
- Add multiple output variables using purrr and a predefined function
- Creating new variables with purrr (how does one go about that?)
- How to compute multiple new columns in a R dataframe with dynamic names
but none of those has quite helped me crack the problem I'm experiencing. [edit: added in last q to that list which may be the technique I need].
<!-- language-all: lang-r -->
# load packages -----------------------------------------------------------
library(rlang)
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(nomisr)
# set up initial list of tibbles ------------------------------------------
df <- list(
district_population = tibble(
dataset_title = "Population estimates - local authority based by single year",
dataset_id = "NM_2002_1"
),
jsa_claimants = tibble(
dataset_title = "Jobseeker\'s Allowance with rates and proportions",
dataset_id = "NM_1_1"
)
)
# just use the first tibble for now, for testing --------------------------
# ideally I want to map across dfs through a list -------------------------
df <- df[[1]]
# nitty gritty functions --------------------------------------------------
get_concept_list <- function(df) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id,
select = c("dimensions", "codes")) %>%
pluck("value", 1, "dimension") %>%
filter(!concept == "geography") %>%
pull("concept")
}
# get_concept_list() returns the strings I need:
get_concept_list(df)
#> [1] "time" "gender" "c_age" "measures"
# Here is a list of examples of types of map* that do various things,
# none of which is what I need it to do
# I'm using toupper() here for simplicity - ultimately I will use
# get_concept_info() to populate the new columns
# this creates four new tibbles
get_concept_list(df) %>%
map(~ mutate(df, {{.x}} := toupper(.x)))
#> [[1]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#>
#> [[2]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 GENDER
#>
#> [[3]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 C_AGE
#>
#> [[4]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this throws an error
get_concept_list(df) %>%
map_chr(~ mutate(df, {{.x}} := toupper(.x)))
#> Error: Result 1 must be a single string, not a vector of class `tbl_df/tbl/data.frame` and of length 3
# this creates three extra rows in the tibble
get_concept_list(df) %>%
map_df(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this does the same as map_df
get_concept_list(df) %>%
map_dfr(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this creates a single tibble 12 columns wide
get_concept_list(df) %>%
map_dfc(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 1 x 12
#> dataset_title dataset_id ..1 dataset_title1 dataset_id1 ..11 dataset_title2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Population e~ NM_2002_1 TIME Population es~ NM_2002_1 GEND~ Population es~
#> # ... with 5 more variables: dataset_id2 <chr>, ..12 <chr>,
#> # dataset_title3 <chr>, dataset_id3 <chr>, ..13 <chr>
# function to get info on each concept (except geography) -----------------
# this is the function I want to use eventually to populate my new columns
get_concept_info <- function(df, concept_name) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id) %>%
filter(name == "dimensions") %>%
pluck("value", 1, "dimension") %>%
filter(concept == concept_name) %>%
pluck("codes.code", 1) %>%
select(name, value) %>%
nest(data = everything()) %>%
as.list() %>%
pluck("data")
}
# individual mutate works, for comparison ---------------------------------
# I can create the kind of table I want manually using a line like the one below
# df %>% map(~ mutate(., measures = get_concept_info(., concept_name = "measures")))
df %>% mutate(., measures = get_concept_info(df, "measures"))
#> # A tibble: 1 x 3
#> dataset_title dataset_id measures
#> <chr> <chr> <list>
#> 1 Population estimates - local authority based by sin~ NM_2002_1 <tibble [2 x ~
<sup>Created on 2020-02-10 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
{{ }}does seem promising/usable, but I'm not sure how to implement it here. Perhaps someone else will offer a solution with that. And I share your frustration with spending days doing this. As it happens, I had the same struggle with this exact issue a while back, but it ended up being helpful for you today, and I'm sure you'll pass it on!