2

I have a problem with some regular expressions in Ruby. This is the situation: Input text:

"NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text --- 
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"

I need a regular expression witch can extract only useful text between "Abonează-te" word.

I tried this result = result.gsub(/^[.]{*}\nAbonează-te/, '') to remove the text from the start of the string to the 'Abonează-te' word, but this does not work. I have no ideea how to solve this situation. Can you help me?

3 Answers 3

2

Instead of using regular expression, you can use String#split, then take the second part:

s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text --- 
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
s.split('Abonează-te', 3)[1].strip  # 3: at most 3 parts
# => "---- Here is some usefull text ---"

UPDATE

If you want to get multiple matches:

s = "NU
Abonează-te
-- Here's some
Abonează-te
text --
Abonează-te
comentariu"
s.split('Abonează-te')[1..-2].map(&:strip)
# => ["-- Here's some", "text --"]
Sign up to request clarification or add additional context in comments.

2 Comments

@kitz This is not an alternative. This is the right way to go. Other answers using scan or gsub are strategically wrong for this purpose.
What if s = "NU Abonează-te\n-- Here's some Abonează-te text --\nAbonează-te comentariu"?
2

You could use string.scan function. You don't need to go for string.gsub function where you want to extract a particular text.

> s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
" Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
" Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
" Abonează-te
" ---- Here is some usefull text --- 
" Abonează-te
" × Citeşte mai mult »
" Adauga un comentariu"
=> "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”\nPublicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35\nAdresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla\nAbonează-te\n---- Here is some usefull text --- \nAbonează-te\n× Citeşte mai mult »\nAdauga un comentariu"
irb(main):010:0> s.scan(/(?<=Abonează-te\n)[\s\S]*?(?=\nAbonează-te)/)
=> ["---- Here is some usefull text --- "]

Remove the newline \n character present inside the lookarounds if necessary. [\s\S]*? will do a non-greedy match of space or non-space characters zero or more times.

DEMO

3 Comments

Good, but could you not strengthen it by adding a capture group and replacing the lookarounds with non-capture groups that included anchors? (Readers: Ruby's lookarounds cannot contain variable-length matches, which are needed to use anchors if the entire text before and after the juicy bits is not to be included.) A small request: could you please remove the IRB prompts? They offend my sensibilities.
you mean this s.scan(/Abonează-te.*\n([\s\S]*?)\nAbonează-te/)[0] . Ahh, i forget that. @CarySwoveland please check my edit is right or wrong.
For s = "NU Abonează-te\n-- Here's some useful Abonează-te text --\nAbonează-te comentariu", s[/(?:^.*?Abonează-te\n)(.*?)(?:\nAbonează-te.*$)/,1] #=> "-- Here's some useful Abonează-te text --".
1

Your regex syntax is incorrect . inside of a character class means match a dot literally, and the {*} matches an opening curly brace "zero or more" times followed by a closing curly brace.

You can match instead of replacing here.

s.match(/Abonează-te(.*?)Abonează-te/m)[1].strip()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.