0

I have a lot of strings that all looking similar, e.g.:

x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"

I would like to extract the: Whatiwant, Whatiwanttoo, and the Whatiwa in R.

I started with substring(x1,15,23), but I don't know how to generalize it. How can I always extract the part between the last _ and the .txt ?

Thank you!

2
  • 1
    Hint: regular expressions. Commented Feb 26, 2015 at 16:34
  • Add the regex tag and you'll get answers in the next 2 minutes. Commented Feb 26, 2015 at 16:38

2 Answers 2

2

You can use regexp capture groups:

gsub(".*_([^_]*)\\.txt","\\1",x1)

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

how do you plot this flowchart ?
using this (java-script style so can be different) http://www.regexplained.co.uk/, plenty of other sites that do the same
thanks, may I know why you only use .*([^]*)\.txt" to get the flowchart ? if I use entire ".*([^]*)\\.txt","\\1" , I get something different :-p
because the website takes javascript style regexp
0

You can also use the stringr library with funtions like str_extract (and many other possibilities) only in case you don't get into regular expressions. It is extremely easy to use

x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"
library(stringr)
patron <- "(What)[a-z]+"
str_extract(x1, patron)
## [1] "Whatiwant"
str_extract(x2, patron)
## [1] "Whatiwanttoo"
str_extract(x3, patron)
## [1] "Whatiwa"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.