Ruby remove parts of a string

Question

I have a problem with some regular expressions in Ruby. This is the situation: Input text:

"NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text --- 
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"

I need a regular expression witch can extract only useful text between "Abonează-te" word.

I tried this result = result.gsub(/^[.]{*}\nAbonează-te/, '') to remove the text from the start of the string to the 'Abonează-te' word, but this does not work. I have no ideea how to solve this situation. Can you help me?

falsetru · Accepted Answer · 2015-02-11 17:19:37Z

2

Instead of using regular expression, you can use String#split, then take the second part:

s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text --- 
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
s.split('Abonează-te', 3)[1].strip  # 3: at most 3 parts
# => "---- Here is some usefull text ---"

UPDATE

If you want to get multiple matches:

s = "NU
Abonează-te
-- Here's some
Abonează-te
text --
Abonează-te
comentariu"
s.split('Abonează-te')[1..-2].map(&:strip)
# => ["-- Here's some", "text --"]

edited Feb 11, 2015 at 17:19

answered Feb 10, 2015 at 16:09

falsetru

371k69 gold badges770 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sawa Over a year ago

@kitz This is not an alternative. This is the right way to go. Other answers using scan or gsub are strategically wrong for this purpose.

Cary Swoveland Over a year ago

What if s = "NU Abonează-te\n-- Here's some Abonează-te text --\nAbonează-te comentariu"?

Avinash Raj · Accepted Answer · 2015-02-10 17:10:49Z

2

You could use string.scan function. You don't need to go for string.gsub function where you want to extract a particular text.

> s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
" Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
" Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
" Abonează-te
" ---- Here is some usefull text --- 
" Abonează-te
" × Citeşte mai mult »
" Adauga un comentariu"
=> "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”\nPublicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35\nAdresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla\nAbonează-te\n---- Here is some usefull text --- \nAbonează-te\n× Citeşte mai mult »\nAdauga un comentariu"
irb(main):010:0> s.scan(/(?<=Abonează-te\n)[\s\S]*?(?=\nAbonează-te)/)
=> ["---- Here is some usefull text --- "]

Remove the newline \n character present inside the lookarounds if necessary. [\s\S]*? will do a non-greedy match of space or non-space characters zero or more times.

DEMO

edited Feb 10, 2015 at 17:10

answered Feb 10, 2015 at 16:07

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

3 Comments

Cary Swoveland Over a year ago

Good, but could you not strengthen it by adding a capture group and replacing the lookarounds with non-capture groups that included anchors? (Readers: Ruby's lookarounds cannot contain variable-length matches, which are needed to use anchors if the entire text before and after the juicy bits is not to be included.) A small request: could you please remove the IRB prompts? They offend my sensibilities.

Avinash Raj Over a year ago

you mean this s.scan(/Abonează-te.*\n([\s\S]*?)\nAbonează-te/)[0] . Ahh, i forget that. @CarySwoveland please check my edit is right or wrong.

Cary Swoveland Over a year ago

For s = "NU Abonează-te\n-- Here's some useful Abonează-te text --\nAbonează-te comentariu", s[/(?:^.*?Abonează-te\n)(.*?)(?:\nAbonează-te.*$)/,1] #=> "-- Here's some useful Abonează-te text --".

Tom Lord · Accepted Answer · 2016-07-15 14:12:12Z

1

Your regex syntax is incorrect . inside of a character class means match a dot literally, and the {*} matches an opening curly brace "zero or more" times followed by a closing curly brace.

You can match instead of replacing here.

s.match(/Abonează-te(.*?)Abonează-te/m)[1].strip()

edited Jul 15, 2016 at 14:12

Tom Lord

28.4k5 gold badges54 silver badges86 bronze badges

answered Feb 10, 2015 at 16:18

hwnd

70.9k4 gold badges100 silver badges135 bronze badges

Collectives™ on Stack Overflow

Ruby remove parts of a string

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related