2

I have the following dataframe:

df1 = data.frame(id = 1:4, desc=c("httpmethod=put&hobbies=22.33&utiliites=50.00&home=950.00&entertainment=40.00&redirecturl=&stamp=5%0D%0A++++", "httpmethod=put&hobbies=&utiliites=&home=600.00&entertainment=25.57&redirecturl=&stamp=5%0D%0A++++", "httpmethod=put&hobbies=0.00&utiliites=&home=1127.53&entertainment=50.00&redirecturl=&stamp=5%0D%0A++++", "httpmethod=put&hobbies=&utiliites=&home=&entertainment=&redirecturl=&stamp=5%0D%0A++++"), stringsAsFactors=FALSE)

Which gives:

id desc
1 httpmethod=put&hobbies=22.33&utiliites=50.00&home=950.00&entertainment=40.00&redirecturl=&stamp=5%0D%0A++++
2 httpmethod=put&hobbies=&utiliites=&home=600.00&entertainment=25.57&redirecturl=&stamp=5%0D%0A++++
3 httpmethod=put&hobbies=0.00&utiliites=&home=1127.53&entertainment=50.00&redirecturl=&stamp=5%0D%0A++++
4 httpmethod=put&hobbies=&utiliites=&home=&entertainment=&redirecturl=&stamp=5%0D%0A++++

I'd like:

id hobbies utilities home entertainment
1 22.33 50.00 950.00 40.00
2 NA NA 600.00 25.57
3 0.00 NA 1127.53 50.00
4 NA NA NA NA

I have looked at lots of different things but can't seem to bring it all together. The code I have at the moment is as below, but I'm thinking there must be a more simple/eloquent way (e.g. get the column names from the string).

library(dplyr)
library(tidyr)
library(stringr)

df2 <- df1 %>% 
  separate(desc, c("http","hob", "utl", "hom", "ent", "redirect", "stamp"), sep = "&") %>% 
  mutate(hobbies = str_extract(hob, "\\d+\\.*\\d*")) %>%
  mutate(utilities = str_extract(utl, "\\d+\\.*\\d*")) %>%
  mutate(home = str_extract(hom, "\\d+\\.*\\d*")) %>%
  mutate(entertainment = str_extract(ent, "\\d+\\.*\\d*")) %>%
  select(-c("http","redirect", "stamp"))

I am quite new to R so some explanation of the steps would be good. I did get to the point where I split them but ended up with a list and didn't know what to do to get the values out of the list.

Thanks

2
  • In the third row shouldn't be hobbies=0.00 instead of hobbies0.00=? Commented Jul 26, 2021 at 13:48
  • Yes, typo, happened as I noticed when typing question I didn't have 0.00, so wanted to highlight I wanted 0.00 and not NA, corrected now, thanks Commented Jul 26, 2021 at 13:58

2 Answers 2

3

ignore warnings in this

library(tidyverse)

df1 %>%
  separate_rows(desc, sep = '&') %>%
  separate(desc, into = c('n', 'v'), sep = '=') %>%
  pivot_wider(names_from = n, values_from = v, values_fn = as.numeric) 

#> # A tibble: 4 x 8
#>      id httpmethod hobbies utiliites  home entertainment redirecturl stamp
#>   <int>      <dbl>   <dbl>     <dbl> <dbl>         <dbl>       <dbl> <dbl>
#> 1     1         NA    22.3        50  950           40            NA    NA
#> 2     2         NA    NA          NA  600           25.6          NA    NA
#> 3     3         NA     0          NA 1128.          50            NA    NA
#> 4     4         NA    NA          NA   NA           NA            NA    NA

Created on 2021-07-26 by the reprex package (v2.0.0)

Sign up to request clarification or add additional context in comments.

2 Comments

I'm accepting this answer because it is a more general use case. The other answer was specific to the question posted. I also like how this answer doesn't rely on regex, and so easier, for me, to understand
Glad that it helped @purplealpha :)
3

Once corrected third line hobbies0.00= as commented above,

library(dplyr)
library(tidyr)
df1 %>% 
    separate(col = desc, into = c("http", "hobbies", "utiliites", "home", "entertainment", "redirecturl", "stamp"), sep = "&[a-z]+[0\\.]*=") %>% 
    select(-http, -redirecturl, -stamp)
  id hobbies utiliites    home entertainment
1  1   22.33     50.00  950.00         40.00
2  2                    600.00         25.57
3  3    0.00           1127.53         50.00
4  4                                        

Update

A couple of modifications. One thanks to Shawn Brar comment, let's as.numeric all. The second one, to avoid specify the into vector (but having to remove some weird column):

df1 %>% 
    separate(col = desc, into = strsplit(df1$desc[1], split = "=.*?&")[[1]], sep = "&[a-z]+=") %>% 
    select(-httpmethod, -redirecturl, -`stamp=5%0D%0A++++`) %>% 
    mutate(across(everything(), as.numeric))

  id hobbies utiliites    home entertainment
1  1   22.33        50  950.00         40.00
2  2      NA        NA  600.00         25.57
3  3    0.00        NA 1127.53         50.00
4  4      NA        NA      NA            NA

3 Comments

You can use add %>% mutate_all(as.numeric) so that the column classes change to numeric and then you'll get the empty values as NA.
Thanks @iago, the updated part is exactly what I was looking for where it uses the string to get the names. It's always the regular expressions that get me :) Also I've noticed there is an argument in separate separate(..., convert = TRUE), that converts the new columns to numeric.
@purplealpha I updated again since I had written [0\\.]* in regular expressions only thinking in hobbies0.00= which was already solved, so it is not necessary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.