0

I want to extract the string between the two words (start, end) in a text file but want to start extraction after 2nd occurrence of start till end.

For example, my text is

test.text <- c("During the year new factories at Haridwar for LV apparatus and at Bangalore for LV electric motors commenced production. Further increases in range and LV switchgear capacity augmentation are planned for  motors, HT motors, Drives and .")

I need to start extracting text after the second "LV" (ignore the one which comes later) (case insensitive) till "capacity".

Output should be like:

electric motors commenced production. Further increases in range and
4
  • Hi, Welcome to SO. Can you please help us with your code that you are trying? Commented Oct 6, 2017 at 5:21
  • You said "ignore the one which comes later", but your expected output stops at the LV "that comes later", shouldn't it be electric motors commenced production. Further increases in range and LV switchgear? Commented Oct 6, 2017 at 13:25
  • ohh.. sorry. I want the output till "LV switchgear" ended before "capacity" like this : "electric motors commenced production. Further increases in range and LV switchgear". Just want "LV" to be ignore after 2nd occurrence, It shoud not affect output flow. Commented Oct 7, 2017 at 10:36
  • Consider accepting the answer that helped you the most by clicking on the grey check mark under the downvote button. Commented Oct 9, 2017 at 12:38

2 Answers 2

2

We could locate the position and then do a substr

library(stringr)
i1 <- str_locate_all(test.text, "LV")[[1]][2,2]+2
i2 <- str_locate(test.text, "capacity")[[1]]-2
sub("\\sLV.*", "", substr(test.text, i1, i2))
#[1] "electric motors commenced production. Further increases in range and"
Sign up to request clarification or add additional context in comments.

3 Comments

Thank You very much... It helped in big way :)
Please explain 2nd line of code and especially the use of this (\\s ?! ) notations.I want to start extraction from 2nd occurrence of either of starting words to 2nd occurrence of either of ending words. for example cities <-c("Sydney Banglore Mumbai Newyork banglore LA LS banglore London Chicago mumbai Miami") start extraction either from 2nd occurrence Banglore or chennai or New South Wales (either one will be present, not all) to 2nd occurrence of mumbai or Michigan or New Delhi (either one will be present, not all) Output should be like "LA LS banglore London Chicago". Please help
@JainArihant The \\s implies space. Unlike in other languages, we escape with one more slash
1

A solution with strsplit:

strsplit(test.text, "\\sLV\\s")[[1]][3]    
# [1] "electric motors commenced production. Further increases in range and"

strsplit(test.text, "\\s(LV(?!\\sswitchgear)|capacity)\\s", perl = TRUE)[[1]][3]
# [1] "electric motors commenced production. Further increases in range and LV switchgear"

The first line gives OP's expected output. The second line gives what I think OP really meant.

2 Comments

Please explain 2nd line of code and especially the use of this (\\s ?! ) notations.I want to start extraction from 2nd occurrence of either of starting words to 2nd occurrence of either of ending words. for example cities <-c("Sydney Banglore Mumbai Newyork banglore LA LS banglore London Chicago mumbai Miami") start extraction either from 2nd occurrence Banglore or chennai or New South Wales (either one will be present, not all) to 2nd occurrence of mumbai or Michigan or New Delhi (either one will be present, not all) Output should be like "LA LS banglore London Chicago". Please help
@JainArihant \\s stands for space. (?!\\sswitchgear) is a negative lookahead, meaning "not before " switchgear"", so (LV(?!\\sswitchgear) matches all "LV"'s not immediately followed by a space and "switchgear". For the new specification, either edit your question or ask a new question. It is generally discouraged to add additional requirements like that in the comments.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.