1

I have one column in data.table in R which looks like this.

[1] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\",
[2] "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\",
[3] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\",
[4] "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\",
[5] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\",

But only thing that I care about is whether it is "UNIT_RESULT", "UNIT_CHECKIN", "OEE_DATA" or "PING", so I would like to replace each of row by new string ("UNIT_RESULT" etc.)

Result should looks like:

[1] "UNIT_RESULT"
[2] "UNIT_CHECKIN"
[3] "UNIT_CHECKIN"
[4] "OEE_DATA"
[5] "PING"

I have spent many hours by trying to find how to replace string by its own part but nothing showed me any useful result.

Replace specific characters within strings

Reference - What does this regex mean?

Test if characters in string in R

In the beginning function substring(x, 53, 63) looks like solution for me but it just choose fixed symbols in string so unless I have all rows same it is useless.

Any hints?

4
  • I have edited the post. Have a look now, it should make sense for you. Commented Jun 7, 2018 at 14:14
  • What should happen in the case of no match? For instance, what if the fifth string did not contain PING? Commented Jun 7, 2018 at 14:20
  • List of possible strings are final. There is lets say 15 possible results. So no chance of no matching. Commented Jun 7, 2018 at 18:51
  • @makoLP - Does my post below meet your needs or did I miss something? Let me know...glad to help. Commented Jun 7, 2018 at 19:35

4 Answers 4

1

The str_match_all function will apply a regex to each element of a vector of strings and return only the match. So we can make a list of all the terms we want to extract and use paste0 to join them together with the | OR operator to make a single regular expression that matches any of the 4 desired terms.

Then we just run the str_match_all function and unlist the resulting list into a character vector.

strings <- c("<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\"",
             "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
             "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
             "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\"",
             "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\""
)

items <- c('UNIT_RESULT', 'UNIT_CHECKIN', 'OEE_DATA', 'PING')

library(stringr)
unlist(str_match_all(strings, paste0(items,collapse = '|')))
[1] "UNIT_RESULT"  "UNIT_CHECKIN" "UNIT_CHECKIN" "OEE_DATA"     "PING"        
Sign up to request clarification or add additional context in comments.

2 Comments

Interesting, I didn't know of the extra functionalities of str_match compared to str_extract.
Perfect! Thank you.
0

An alternative is to use str_extract. You pass your string as the 'string' argument and the alternatives you gave as the 'pattern' argument, and it will return whatever of your alternatives is the first one to appear in the string.

library(stringr)

DT[, newstring := str_extract(string_column, "UNIT_RESULT|UNIT_CHECKIN|OEE_DATA|PING")]

Comments

0

I suggest

gsub("^.*?(UNIT_RESULT|UNIT_CHECKIN|OEE_DATA|PING).*","\\1",strings,perl=TRUE)

Comments

0

If you do not have a finite list of strings you are searching for I would recommend using a reg-ex pattern. Here is one that works based on the examples you provided:

# Code to create example data.table
library(data.table)

dt <- data.table(f1 =  c("<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\"",
                     "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
                     "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
                     "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\"",
                     "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\""
))

# Start of code to parse out values:
rex_pattern <- "(?<=(\"))[A-Z]{2,}_*[A-Z]+(?=(\"))"

dt[, .(parsed_val = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]   

This gives you:

     parsed_val
1:  UNIT_RESULT
2: UNIT_CHECKIN
3: UNIT_CHECKIN
4:     OEE_DATA
5:         PING 

If you really want to "overwrite" the original field f1 with the new substring, you can use the following:

dt[, `:=`(f1 = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]

3 Comments

I list of strings that can be there is finite. I have tried your solution but it doesn´t work. Error in .(parsed_val = regmatches(f1, regexpr(pattern = rex_pattern, : could not find function "."
Did you run library(data.table) on the first line of my first block of code? I can recreate your could not find function "." error if I do not load the data.table library.
@makoLP see comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.