0

This is the head of my dataframe (tables_df), all is of type character:

#        Name Ticker  Last price       %     %7d    %30d    %90d    %180d    %365d # Coins
1 1     bitcoin    BTC $ 56,667.17   1.33%  21.13%  74.43% 209.62%  386.02%  489.33%  18.63M
2 2    ethereum    ETH $ 1,976.941   1.00%  10.37%  60.49% 269.53%  404.93%  662.55% 114.75M
3 3 binancecoin    BNB $ 265.20462 -20.41% 102.40% 553.29% 791.32% 1114.70% 1083.61%  99.01M
4 4     cardano    ADA $ 1.1537436  24.50%  31.18% 239.85% 765.91%  849.66% 1909.86%  25.92B
5 5    litecoin    LTC $ 231.95832  -2.28%   4.65%  72.05% 180.01%  282.54%  227.14%  66.08M
6 6     uniswap    UNI $ 30.304469  50.48%  41.73% 268.43% 732.38%       0%       0% 126.20M
  Market cap MC rank
1    $ 1.05T       1
2  $ 226.86B       2
3   $ 26.25B       5
4   $ 29.91B       4
5   $ 15.32B       7
6    $ 3.82B      23

I want to do 3 things in R

  1. convert percentage characters to numeric percentage values in all columns that contain "%" in the column names
  2. convert characters with M (million) or B (billion) or T (trillion) to numeric values in columns "Marcet cap" and "Coins" (x1000.000 for million, x1.000.000.000 for billion and x1.000.000.000.000 for trillion)
  3. remove the dollar signs and convert to numeric in columns "Last price" and "Market cap"

for 1st thing I tried this code:

tables_df %>% mutate_at(vars(contains('%')), str_remove(string = .,pattern = "%")) %>%
  mutate_at(vars(contains('%')),funs =  as.numeric())

and this code:

tables_df %>% parse_number("%", trim_ws = TRUE)

unfortunately both attempts don't work.

For my second and third things I haven't figured out where to start yet. Can someone help me out? Thanks a lot!

dput(head(tables_df))
structure(list(`#` = c("1", "2", "3", "4", "5", "6"), c("", "", 
"", "", "", ""), Name = c("bitcoin", "ethereum", "binancecoin", 
"cardano", "litecoin", "uniswap"), Ticker = c("BTC", "ETH", "BNB", 
"ADA", "LTC", "UNI"), `Last price` = c("$ 57,431.62", "$ 1,959.472", 
"$ 302.91449", "$ 1.1072817", "$ 230.71046", "$ 29.981031"), 
    `%` = c("2.59%", "2.21%", "19.01%", "-0.96%", "1.50%", "4.01%"
    ), `%7d` = c("17.24%", "7.72%", "123.90%", "32.27%", "5.13%", 
    "38.44%"), `%30d` = c("80.95%", "63.35%", "648.16%", "234.17%", 
    "66.21%", "259.22%"), `%90d` = c("211.37%", "236.87%", "893.12%", 
    "617.17%", "165.07%", "688.65%"), `%180d` = c("388.06%", 
    "379.98%", "1238.50%", "793.63%", "271.27%", "0%"), `%365d` = c("496.72%", 
    "658.28%", "1263.44%", "1798.47%", "231.57%", "0%"), `# Coins` = c("18.63M", 
    "114.76M", "99.01M", "25.92B", "66.08M", "126.20M"), `Market cap` = c("$ 1.07T", 
    "$ 224.87B", "$ 29.99B", "$ 28.70B", "$ 15.24B", "$ 3.78B"
    ), `MC rank` = c("1", "2", "4", "5", "7", "23")), row.names = c(NA, 
6L), class = "data.frame")

1 Answer 1

1

Write a function to change million/billions/trillions to actual numbers.

change_num <- function(x) {
  x1 <- parse_number(x)
  x1 * case_when(grepl('T', x) ~ 1e12, 
                 grepl('B', x) ~ 1e9, 
                 grepl('M', x) ~ 1e6)
}

Use across to apply functions to multiple columns.

library(dplyr)
library(readr)

tables_df <- tables_df[-2] %>%
                mutate(across(`Last price`:`%365d`, parse_number),
                  across(c(`# Coins`,`Market cap`), change_num))

tables_df

#           Name Ticker   Last price     %    %7d   %30d   %90d   %180d   %365d
#1 1     bitcoin    BTC 57431.620000  2.59  17.24  80.95 211.37  388.06  496.72
#2 2    ethereum    ETH  1959.472000  2.21   7.72  63.35 236.87  379.98  658.28
#3 3 binancecoin    BNB   302.914490 19.01 123.90 648.16 893.12 1238.50 1263.44
#4 4     cardano    ADA     1.107282 -0.96  32.27 234.17 617.17  793.63 1798.47
#5 5    litecoin    LTC   230.710460  1.50   5.13  66.21 165.07  271.27  231.57
#6 6     uniswap    UNI    29.981031  4.01  38.44 259.22 688.65    0.00    0.00

#     # Coins Market cap MC rank
#1 1.8630e+07 1.0700e+12       1
#2 1.1476e+08 2.2487e+11       2
#3 9.9010e+07 2.9990e+10       4
#4 2.5920e+10 2.8700e+10       5
#5 6.6080e+07 1.5240e+10       7
#6 1.2620e+08 3.7800e+09      23
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks al lot! Unfortunately I keep getting this error: > tables_df %>% + mutate(across(c("Last price" : "%365d"), parse_number), + across(c("# Coins", "Market cap"), change_num)) Error in initialize(...) : attempt to use zero-length variable name
Tried this: tables_df %>% mutate(across("Last price" : "%365d", parse_number), across(c("# Coins", "Market cap"), change_num)) --> but still getting the same error message. The columnnames I copied and paste them with the dput(colnames(tables_df)) function, so they are exactly spelled correct.
@A.Jackson Can you edit your post with dput(head(tables_df)) so that I get copy-pasteable version of your data.
Yes I just added, hopefully it helps
The second column in your data is empty with no column name hence you get the error. The column names are also not standard column names for R which makes it difficult to refer them. It is usually better to perform some data cleaning earlier to avoid such errors. You can try the updated answer now.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.