Remove string character from multiple columns in R and mutate

Question

This is the head of my dataframe (tables_df), all is of type character:

#        Name Ticker  Last price       %     %7d    %30d    %90d    %180d    %365d # Coins
1 1     bitcoin    BTC $ 56,667.17   1.33%  21.13%  74.43% 209.62%  386.02%  489.33%  18.63M
2 2    ethereum    ETH $ 1,976.941   1.00%  10.37%  60.49% 269.53%  404.93%  662.55% 114.75M
3 3 binancecoin    BNB $ 265.20462 -20.41% 102.40% 553.29% 791.32% 1114.70% 1083.61%  99.01M
4 4     cardano    ADA $ 1.1537436  24.50%  31.18% 239.85% 765.91%  849.66% 1909.86%  25.92B
5 5    litecoin    LTC $ 231.95832  -2.28%   4.65%  72.05% 180.01%  282.54%  227.14%  66.08M
6 6     uniswap    UNI $ 30.304469  50.48%  41.73% 268.43% 732.38%       0%       0% 126.20M
  Market cap MC rank
1    $ 1.05T       1
2  $ 226.86B       2
3   $ 26.25B       5
4   $ 29.91B       4
5   $ 15.32B       7
6    $ 3.82B      23

I want to do 3 things in R

convert percentage characters to numeric percentage values in all columns that contain "%" in the column names
convert characters with M (million) or B (billion) or T (trillion) to numeric values in columns "Marcet cap" and "Coins" (x1000.000 for million, x1.000.000.000 for billion and x1.000.000.000.000 for trillion)
remove the dollar signs and convert to numeric in columns "Last price" and "Market cap"

for 1st thing I tried this code:

tables_df %>% mutate_at(vars(contains('%')), str_remove(string = .,pattern = "%")) %>%
  mutate_at(vars(contains('%')),funs =  as.numeric())

and this code:

tables_df %>% parse_number("%", trim_ws = TRUE)

unfortunately both attempts don't work.

For my second and third things I haven't figured out where to start yet. Can someone help me out? Thanks a lot!

dput(head(tables_df))
structure(list(`#` = c("1", "2", "3", "4", "5", "6"), c("", "", 
"", "", "", ""), Name = c("bitcoin", "ethereum", "binancecoin", 
"cardano", "litecoin", "uniswap"), Ticker = c("BTC", "ETH", "BNB", 
"ADA", "LTC", "UNI"), `Last price` = c("$ 57,431.62", "$ 1,959.472", 
"$ 302.91449", "$ 1.1072817", "$ 230.71046", "$ 29.981031"), 
    `%` = c("2.59%", "2.21%", "19.01%", "-0.96%", "1.50%", "4.01%"
    ), `%7d` = c("17.24%", "7.72%", "123.90%", "32.27%", "5.13%", 
    "38.44%"), `%30d` = c("80.95%", "63.35%", "648.16%", "234.17%", 
    "66.21%", "259.22%"), `%90d` = c("211.37%", "236.87%", "893.12%", 
    "617.17%", "165.07%", "688.65%"), `%180d` = c("388.06%", 
    "379.98%", "1238.50%", "793.63%", "271.27%", "0%"), `%365d` = c("496.72%", 
    "658.28%", "1263.44%", "1798.47%", "231.57%", "0%"), `# Coins` = c("18.63M", 
    "114.76M", "99.01M", "25.92B", "66.08M", "126.20M"), `Market cap` = c("$ 1.07T", 
    "$ 224.87B", "$ 29.99B", "$ 28.70B", "$ 15.24B", "$ 3.78B"
    ), `MC rank` = c("1", "2", "4", "5", "7", "23")), row.names = c(NA, 
6L), class = "data.frame")

Ronak Shah · Accepted Answer · 2021-02-22 09:21:32Z

1

Write a function to change million/billions/trillions to actual numbers.

change_num <- function(x) {
  x1 <- parse_number(x)
  x1 * case_when(grepl('T', x) ~ 1e12, 
                 grepl('B', x) ~ 1e9, 
                 grepl('M', x) ~ 1e6)
}

Use across to apply functions to multiple columns.

library(dplyr)
library(readr)

tables_df <- tables_df[-2] %>%
                mutate(across(`Last price`:`%365d`, parse_number),
                  across(c(`# Coins`,`Market cap`), change_num))

tables_df

#           Name Ticker   Last price     %    %7d   %30d   %90d   %180d   %365d
#1 1     bitcoin    BTC 57431.620000  2.59  17.24  80.95 211.37  388.06  496.72
#2 2    ethereum    ETH  1959.472000  2.21   7.72  63.35 236.87  379.98  658.28
#3 3 binancecoin    BNB   302.914490 19.01 123.90 648.16 893.12 1238.50 1263.44
#4 4     cardano    ADA     1.107282 -0.96  32.27 234.17 617.17  793.63 1798.47
#5 5    litecoin    LTC   230.710460  1.50   5.13  66.21 165.07  271.27  231.57
#6 6     uniswap    UNI    29.981031  4.01  38.44 259.22 688.65    0.00    0.00

#     # Coins Market cap MC rank
#1 1.8630e+07 1.0700e+12       1
#2 1.1476e+08 2.2487e+11       2
#3 9.9010e+07 2.9990e+10       4
#4 2.5920e+10 2.8700e+10       5
#5 6.6080e+07 1.5240e+10       7
#6 1.2620e+08 3.7800e+09      23

edited Feb 22, 2021 at 9:21

answered Feb 21, 2021 at 1:24

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

A. Jackson Over a year ago

Thanks al lot! Unfortunately I keep getting this error: > tables_df %>% + mutate(across(c("Last price" : "%365d"), parse_number), + across(c("# Coins", "Market cap"), change_num)) Error in initialize(...) : attempt to use zero-length variable name

A. Jackson Over a year ago

Tried this: tables_df %>% mutate(across("Last price" : "%365d", parse_number), across(c("# Coins", "Market cap"), change_num)) --> but still getting the same error message. The columnnames I copied and paste them with the dput(colnames(tables_df)) function, so they are exactly spelled correct.

Ronak Shah Over a year ago

@A.Jackson Can you edit your post with dput(head(tables_df)) so that I get copy-pasteable version of your data.

A. Jackson Over a year ago

Yes I just added, hopefully it helps

Ronak Shah Over a year ago

The second column in your data is empty with no column name hence you get the error. The column names are also not standard column names for R which makes it difficult to refer them. It is usually better to perform some data cleaning earlier to avoid such errors. You can try the updated answer now.

|

Collectives™ on Stack Overflow

Remove string character from multiple columns in R and mutate

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related