2

I have the following dataframe, and I need to perform several functions on 4 specific columns. Here they are named as 'adjusted_cost','adjusted_cost_2','market_value','market_value_2'. However, the names keep changing with different dataframes, like 'cost', 'cost_2', 'value', 'value_2'.

structure(list(company = c("Investments", "Debt", 
"Cash", "", ""), interest_rate = c("", 
"", "7.46", "8.44", ""), adjusted_cost = c("", "", "$", 
"", ""), adjusted_cost_2 = c("", "", "3,586 (EUR 3,381)", "3,564 (GBP 2,955)", "7,150"
), market_value = c("", "", "$3,505 (EUR 3,578)", "3,450(GBP 3,090)", 
""), market_value_2 = c("", "", "$3,505 (EUR 3,578)", "3,450(GBP 3,090)", 
"6,955")), row.names = c(NA, 5L), class = "data.frame")
                                                                  ​               ​        ​

I would like to remove the values in parentheses, ie (EUR 3,381) and (GBP 2,955) and then use the parse_number method to keep only the numbers from these specific columns.

Desired Output

company  interest_rate   adjusted_cost  adjusted_cost_2 market_value  market_value_2
    <chr>   <chr>             <dbl>        <dbl>          <dbl>           <dbl>
1   Investments                 
2   Debt                    
3   Cash    7.46                            3586           3505          3505
4           8.44                            3564           3450          3450
5                                                          7150          6955

Any suggestions would be appreciated. Thanks!

2 Answers 2

2

You could do this entirely in base R with a combination of lapply and gsub, or alternatively if you want to stick to readr::parse_number(), using lapply and readr::parse_number:

# columns to change
changecols <- c('adjusted_cost', 'adjusted_cost_2', 'market_value', 'market_value_2')

#' Alternative if your column names of interest change but 
#' ALWAYS have "cost" and "value" and no other columns have 
#' those words

changecols <- names(df)[grep("cost|value", names(df))]
# [1] "adjusted_cost"   "adjusted_cost_2" "market_value"    "market_value_2"

Base R

df2[changecols] <- lapply(df2[changecols], function(x)
    as.numeric(gsub("\\s*\\([^\\)]+\\)|,|\\$", "", x))
  )

If you wanted the space instead of NA for reporting purposes, just remove the as.numeric() wrapper.

Using readr::parse_number()

df[changecols] <- lapply(df[changecols], readr::parse_number)

In both cases the output is identical:

#       company interest_rate adjusted_cost adjusted_cost_2 market_value market_value_2
# 1 Investments                          NA              NA           NA             NA
# 2        Debt                          NA              NA           NA             NA
# 3        Cash          7.46            NA            3586         3505           3505
# 4                      8.44            NA            3564         3450           3450
# 5                                      NA            7150           NA           6955

If you want to replace NA with a blank for reporting purposes with the parse_number approach, simply:

df[is.na(df)] <- ""

#       company interest_rate adjusted_cost adjusted_cost_2 market_value market_value_2
# 1 Investments                                                                        
# 2        Debt                                                                        
# 3        Cash          7.46                          3586         3505           3505
# 4                      8.44                          3564         3450           3450
# 5                                                    7150                        6955
Sign up to request clarification or add additional context in comments.

Comments

2

We can use parse_number after looping across the columns of interest

library(dplyr)
df1 %>% 
   mutate(across(adjusted_cost:market_value_2, ~ readr::parse_number(.x)))

-output

  company interest_rate adjusted_cost adjusted_cost_2 market_value market_value_2
1 Investments                          NA              NA           NA             NA
2        Debt                          NA              NA           NA             NA
3        Cash          7.46            NA            3586         3505           3505
4                      8.44            NA            3564         3450           3450
5                                      NA            7150           NA           6955

Or using base R

df1[4:ncol(df1)] <- lapply(df1[4:ncol(df1)], \(x) 
    as.numeric(sub("\\s*\\(.*", "", gsub("[$,]", "", x))))

-output

> df1
      company interest_rate adjusted_cost adjusted_cost_2 market_value market_value_2
1 Investments                                          NA           NA             NA
2        Debt                                          NA           NA             NA
3        Cash          7.46             $            3586           NA             NA
4                      8.44                          3564         3450           3450
5                                                    7150           NA           6955

NOTE: We just need to use only parse_number to get the output desired. In addition, blanks ("") are considered as character, so it is better to return NA for numeric columns

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.