How to extract substrings dynamically

Question

From the string

s <- "|tree| Lorem ipsum dolor sit amet, |house| consectetur adipiscing elit, 
|street| sed do eiusmod tempor incididunt ut labore et |car| dolore magna aliqua."

I want to extract the text after the letters within the |-symbols.

My approach:

words <- list("tree","house","street","car")

for(word in words){
   expression <- paste0("^.*\\|",word,"\\|\\s*(.+?)\\s*\\|.*$")
   print(sub(expression, "\\1", s))
}

This works fine for all but the last wortd car. It instead returns the entire string s. How can I modify the regex such that for the last element of words-list in prints out dolore magna aliqua..

\Edit: Previously the list with expressions was a,b,c,d. Solutions to this specific problem cannot be generalized very well.

For getting the regex right, I'd recommend taking a look at regex.inginf.units.it if you're not very comfortable with it — mhovd
– mhovd, Commented Aug 17, 2020 at 11:50
I always find using sub in these cases confusing, since you have to specify what you DON'T want to keep instead of (the more natural) what you DO want to keep. I'd advise using stringi::stri_extract_all, for example: stringi::stri_extract_all(regex = "(?<=\\|[abcd]\\| )([^\\|]+)", s). This uses a lookbehind to match the |a|, |b|, |c| and |d| without capturing it. — Bas
– Bas, Commented Aug 17, 2020 at 11:54
Thanks, suppose the expressions I am looking for are not a,b,c,d but instead tree,house,street,car. How would I do it? — volfi
– volfi, Commented Aug 17, 2020 at 12:06

daniellga · Accepted Answer · 2020-08-17 11:53:32Z

2

Try this:

library(stringi)

s <- '|a| Lorem ipsum dolor sit amet, |b| consectetur adipiscing elit, 
|c| sed do eiusmod tempor incididunt ut labore et |d| dolore magna aliqua.'

stri_split_regex(s, '\\|[:alpha:]\\|')

[[1]]
[1] ""                                                " Lorem ipsum dolor sit amet, "                  
[3] " consectetur adipiscing elit, \n"                " sed do eiusmod tempor incididunt ut labore et "
[5] " dolore magna aliqua."

answered Aug 17, 2020 at 11:53

daniellga

1,2549 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Wimpel Over a year ago

or stringr::str_split( s, pattern = "\\|[a-z]\\| ")

Mike V · Accepted Answer · 2020-08-17 23:47:18Z

2

You can try this pattern

library(stringr)
s <- "|tree| Lorem ipsum dolor sit amet, |house| consectetur adipiscing elit, 
|street| sed do eiusmod tempor incididunt ut labore et |car| dolore magna aliqua."

str_extract_all(s, regex("(?<=\\|)\\w+(?=\\|)"))
#[1] "tree"   "house"  "street" "car"

(?<=\\|): Look behind, position following by |; \\|: is an escape for |
\\w: word characters
(?=\\|): Lookahead, position followed by |

answered Aug 17, 2020 at 23:47

Mike V

1,3642 gold badges11 silver badges22 bronze badges

Comments

Wiktor Stribiżew · Accepted Answer · 2020-08-17 12:19:45Z

I suggest extracting all the words with corresponding values using stringr::str_match_all:

s <- "|tree| Lorem ipsum dolor sit amet, |house| consectetur adipiscing elit, 
|street| sed do eiusmod tempor incididunt ut labore et |car| dolore magna aliqua."
words1 <- list("tree","house","street","car")
library(stringr)
expression <- paste0("\\|(", paste(words1, collapse="|"),")\\|\\s*([^|]*)")
result <- str_match_all(s, expression)
lapply(result, function(x) x[,-1])

See the R demo

Output:

[[1]]
     [,1]     [,2]                                            
[1,] "tree"   "Lorem ipsum dolor sit amet, "                  
[2,] "house"  "consectetur adipiscing elit, \n"               
[3,] "street" "sed do eiusmod tempor incididunt ut labore et "
[4,] "car"    "dolore magna aliqua."

The regex is

\|(tree|house|street|car)\|\s*([^|]*)

See the regex demo, details:

\| - a | char
(tree|house|street|car) - Group 1: one of the words
\| - a | char
\s* - 0 or more whitespace chars
([^|]*) - Group 2: any 0 or more chars other than |.

Collectives™ on Stack Overflow

How to extract substrings dynamically

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related