R split string to columns using string as column name and use any numbers as values in those columns

Question

I have the following dataframe:

df1 = data.frame(id = 1:4, desc=c("httpmethod=put&hobbies=22.33&utiliites=50.00&home=950.00&entertainment=40.00&redirecturl=&stamp=5%0D%0A++++", "httpmethod=put&hobbies=&utiliites=&home=600.00&entertainment=25.57&redirecturl=&stamp=5%0D%0A++++", "httpmethod=put&hobbies=0.00&utiliites=&home=1127.53&entertainment=50.00&redirecturl=&stamp=5%0D%0A++++", "httpmethod=put&hobbies=&utiliites=&home=&entertainment=&redirecturl=&stamp=5%0D%0A++++"), stringsAsFactors=FALSE)

Which gives:

id	desc
1	httpmethod=put&hobbies=22.33&utiliites=50.00&home=950.00&entertainment=40.00&redirecturl=&stamp=5%0D%0A++++
2	httpmethod=put&hobbies=&utiliites=&home=600.00&entertainment=25.57&redirecturl=&stamp=5%0D%0A++++
3	httpmethod=put&hobbies=0.00&utiliites=&home=1127.53&entertainment=50.00&redirecturl=&stamp=5%0D%0A++++
4	httpmethod=put&hobbies=&utiliites=&home=&entertainment=&redirecturl=&stamp=5%0D%0A++++

I'd like:

id	hobbies	utilities	home	entertainment
1	22.33	50.00	950.00	40.00
2	NA	NA	600.00	25.57
3	0.00	NA	1127.53	50.00
4	NA	NA	NA	NA

I have looked at lots of different things but can't seem to bring it all together. The code I have at the moment is as below, but I'm thinking there must be a more simple/eloquent way (e.g. get the column names from the string).

library(dplyr)
library(tidyr)
library(stringr)

df2 <- df1 %>% 
  separate(desc, c("http","hob", "utl", "hom", "ent", "redirect", "stamp"), sep = "&") %>% 
  mutate(hobbies = str_extract(hob, "\\d+\\.*\\d*")) %>%
  mutate(utilities = str_extract(utl, "\\d+\\.*\\d*")) %>%
  mutate(home = str_extract(hom, "\\d+\\.*\\d*")) %>%
  mutate(entertainment = str_extract(ent, "\\d+\\.*\\d*")) %>%
  select(-c("http","redirect", "stamp"))

I am quite new to R so some explanation of the steps would be good. I did get to the point where I split them but ended up with a list and didn't know what to do to get the values out of the list.

Thanks

In the third row shouldn't be hobbies=0.00 instead of hobbies0.00=? — iago
– iago, Commented Jul 26, 2021 at 13:48
Yes, typo, happened as I noticed when typing question I didn't have 0.00, so wanted to highlight I wanted 0.00 and not NA, corrected now, thanks — purplealpha
– purplealpha, Commented Jul 26, 2021 at 13:58

AnilGoyal · Accepted Answer · 2021-07-26 15:26:25Z

3

ignore warnings in this

library(tidyverse)

df1 %>%
  separate_rows(desc, sep = '&') %>%
  separate(desc, into = c('n', 'v'), sep = '=') %>%
  pivot_wider(names_from = n, values_from = v, values_fn = as.numeric) 

#> # A tibble: 4 x 8
#>      id httpmethod hobbies utiliites  home entertainment redirecturl stamp
#>   <int>      <dbl>   <dbl>     <dbl> <dbl>         <dbl>       <dbl> <dbl>
#> 1     1         NA    22.3        50  950           40            NA    NA
#> 2     2         NA    NA          NA  600           25.6          NA    NA
#> 3     3         NA     0          NA 1128.          50            NA    NA
#> 4     4         NA    NA          NA   NA           NA            NA    NA

^{Created on 2021-07-26 by the reprex package (v2.0.0)}

answered Jul 26, 2021 at 15:26

AnilGoyal

26.3k4 gold badges34 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

purplealpha Over a year ago

I'm accepting this answer because it is a more general use case. The other answer was specific to the question posted. I also like how this answer doesn't rely on regex, and so easier, for me, to understand

AnilGoyal Over a year ago

Glad that it helped @purplealpha :)

iago · Accepted Answer · 2021-07-26 14:59:47Z

3

Once corrected third line hobbies0.00= as commented above,

library(dplyr)
library(tidyr)
df1 %>% 
    separate(col = desc, into = c("http", "hobbies", "utiliites", "home", "entertainment", "redirecturl", "stamp"), sep = "&[a-z]+[0\\.]*=") %>% 
    select(-http, -redirecturl, -stamp)
  id hobbies utiliites    home entertainment
1  1   22.33     50.00  950.00         40.00
2  2                    600.00         25.57
3  3    0.00           1127.53         50.00
4  4

Update

A couple of modifications. One thanks to Shawn Brar comment, let's as.numeric all. The second one, to avoid specify the into vector (but having to remove some weird column):

df1 %>% 
    separate(col = desc, into = strsplit(df1$desc[1], split = "=.*?&")[[1]], sep = "&[a-z]+=") %>% 
    select(-httpmethod, -redirecturl, -`stamp=5%0D%0A++++`) %>% 
    mutate(across(everything(), as.numeric))

  id hobbies utiliites    home entertainment
1  1   22.33        50  950.00         40.00
2  2      NA        NA  600.00         25.57
3  3    0.00        NA 1127.53         50.00
4  4      NA        NA      NA            NA

edited Jul 26, 2021 at 14:59

answered Jul 26, 2021 at 13:50

iago

3,2964 gold badges25 silver badges37 bronze badges

3 Comments

Shawn Brar Over a year ago

You can use add %>% mutate_all(as.numeric) so that the column classes change to numeric and then you'll get the empty values as NA.

purplealpha Over a year ago

Thanks @iago, the updated part is exactly what I was looking for where it uses the string to get the names. It's always the regular expressions that get me :) Also I've noticed there is an argument in separate separate(..., convert = TRUE), that converts the new columns to numeric.

iago Over a year ago

@purplealpha I updated again since I had written [0\\.]* in regular expressions only thinking in hobbies0.00= which was already solved, so it is not necessary.

Collectives™ on Stack Overflow

R split string to columns using string as column name and use any numbers as values in those columns

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related