0

I have data in the following format:

test1 <- data.frame(value = c('25.5 (5%);  39.65 (23%)', '28.15(5%) and 55.66 (34%) and 33.26   (14%)', '45   56.9565', '95.6666 (55%)  89.2343(90%)   51.56 (28%)'))
test2 <- data.frame(value = c('36.5', '55.658', '47.8', '51.562'))

I need to split the values in column test1 into three columns (col1, col2 and col3) and then compare and highlight the value in the column (test2) that is within +/- 0.1 of the value in one of the three columns (col1, col2 and col3) as shown in the image below.

Please suggest on how to proceed with this.

col1    col2    col3    test2
25.5    39.65           36.5
28.15   55.66   33.26   **55.658**
45      56.9565         47.8
95.6666 89.2343 51.56   **51.562**

string split (and cleaned) into columns

0

1 Answer 1

2

We can use gsub with read.table to extract the 'value' column into three columns

df1 <- read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+", "", test1$value), 
                    header=FALSE, fill=TRUE, col.names = paste0("col", 1:3))

and cbind it with the 'test2'

df2 <- cbind(df1, test2)
df2
#    col1    col2  col3  value
#1 25.5000 39.6500    NA   36.5
#2 28.1500 55.6600 33.26 55.658
#3 45.0000 56.9565    NA   47.8
#4 95.6666 89.2343 51.56 51.562

Update

With the new data

cbind(read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+|[;,]\\s*", "", 
   test1$value), header=FALSE, fill=TRUE, col.names = paste0("col", 1:3)), test2)
#    col1    col2  col3  value
#1 25.5000 39.6500    NA   36.5
#2 28.1500 55.6600 33.26 55.658
#3 45.0000 56.9565    NA   47.8
#4 95.6666 89.2343 51.56 51.562
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @akrun. I was breaking my head last 3 days to solve this and you solved this in 3 seconds. Great! I very much appreciate your skill and your kindness.
What if there was a semicolon or a comma after the bracket like for example: test1 <- data.frame(value = c('25.5 (5%); 39.65 (23%)', '28.15(5%); and 55.66 (34%) and 33.26 (14%)', '45 56.9565', '95.6666 (55%) 89.2343(90%) 51.56 (28%)'))? When I parse this using the method suggested by you, semicolon goes in to a separate column. What am I missing here?
@RanonKahn In that case, cbind(read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+|[;,]\\s*", "", test1$value), header=FALSE, fill=TRUE, col.names = paste0("col", 1:3)), test2)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.