Split and clean multiple length strings in a column to multiple columns using R script

Question

I have data in the following format:

test1 <- data.frame(value = c('25.5 (5%);  39.65 (23%)', '28.15(5%) and 55.66 (34%) and 33.26   (14%)', '45   56.9565', '95.6666 (55%)  89.2343(90%)   51.56 (28%)'))
test2 <- data.frame(value = c('36.5', '55.658', '47.8', '51.562'))

I need to split the values in column test1 into three columns (col1, col2 and col3) and then compare and highlight the value in the column (test2) that is within +/- 0.1 of the value in one of the three columns (col1, col2 and col3) as shown in the image below.

Please suggest on how to proceed with this.

col1    col2    col3    test2
25.5    39.65           36.5
28.15   55.66   33.26   **55.658**
45      56.9565         47.8
95.6666 89.2343 51.56   **51.562**

string split (and cleaned) into columns

akrun · Accepted Answer · 2017-03-15 01:42:15Z

2

We can use gsub with read.table to extract the 'value' column into three columns

df1 <- read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+", "", test1$value), 
                    header=FALSE, fill=TRUE, col.names = paste0("col", 1:3))

and cbind it with the 'test2'

df2 <- cbind(df1, test2)
df2
#    col1    col2  col3  value
#1 25.5000 39.6500    NA   36.5
#2 28.1500 55.6600 33.26 55.658
#3 45.0000 56.9565    NA   47.8
#4 95.6666 89.2343 51.56 51.562

Update

With the new data

cbind(read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+|[;,]\\s*", "", 
   test1$value), header=FALSE, fill=TRUE, col.names = paste0("col", 1:3)), test2)
#    col1    col2  col3  value
#1 25.5000 39.6500    NA   36.5
#2 28.1500 55.6600 33.26 55.658
#3 45.0000 56.9565    NA   47.8
#4 95.6666 89.2343 51.56 51.562

edited Mar 15, 2017 at 1:42

answered Nov 10, 2016 at 3:13

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

RanonKahn Over a year ago

Thanks @akrun. I was breaking my head last 3 days to solve this and you solved this in 3 seconds. Great! I very much appreciate your skill and your kindness.

RanonKahn Over a year ago

What if there was a semicolon or a comma after the bracket like for example: test1 <- data.frame(value = c('25.5 (5%); 39.65 (23%)', '28.15(5%); and 55.66 (34%) and 33.26 (14%)', '45 56.9565', '95.6666 (55%) 89.2343(90%) 51.56 (28%)'))? When I parse this using the method suggested by you, semicolon goes in to a separate column. What am I missing here?

akrun Over a year ago

@RanonKahn In that case,

cbind(read.table(text=gsub("\\([^)]+\\)|[A-Za-z]+|[;,]\\s*", "", test1$value),  header=FALSE, fill=TRUE, col.names = paste0("col", 1:3)), test2)

Collectives™ on Stack Overflow

Split and clean multiple length strings in a column to multiple columns using R script

1 Answer 1

Update

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Update

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related