3636<br >
3737
3838* Examples in this chapter will deal with * ASCII* characters only unless otherwise specified
39- * Some features are not documented on [ ruby-doc: Regexp] ( https://ruby-doc.org/core-2.5.0/Regexp.html ) , see [ Onigmo regular expressions library] ( https://github.com/k-takata/Onigmo/blob/master/doc/RE ) for such cases
39+ * Ruby Regexp is based on [ Onigmo regular expressions library] ( https://github.com/k-takata/Onigmo/blob/master/doc/RE )
40+ * An edited and expanded version of this chapter is available as [ free e-book on leanpub] ( https://leanpub.com/rubyregexp )
4041
4142<br >
4243
@@ -521,8 +522,8 @@ a baz
521522>> ' a || b' .gsub (/\| / , ' &' )
522523=> " a && b"
523524
524- >> ' \foo\bar\baz ' .gsub (/ \\/ , ' /' )
525- => " /foo/bar/baz "
525+ >> ' \learn\by\example ' .gsub (/ \\/ , ' /' )
526+ => " /learn/by/example "
526527```
527528
528529* use ` Regexp.escape ` to let Ruby handle escaping all the metacharacters present in a string
@@ -551,6 +552,7 @@ a baz
551552```
552553
553554* use ` %r ` percent string to use any other delimiter than the default ` / `
555+ * Note: no need to worry about unescaped delimiter inside ` #{} ` interpolation
554556
555557``` ruby
556558>> ' /foo/bar/baz/123' .match?(' o/bar/baz/1' )
@@ -605,8 +607,8 @@ a baz
605607=> " X spare X party"
606608
607609# same as: /\b(re.d|red)\b/
608- >> ' red read ready re;d redo reed' . gsub (/\b re.?d\b / , ' X ' )
609- => " X X ready X redo X "
610+ >> %w[ red read ready re;d redo reed] .grep (/\b re.?d\b / )
611+ => [ " red " , " read " , " re;d " , " reed " ]
610612
611613# same as: /part|parrot/
612614>> ' par part parrot parent' .gsub (/par(ro) ?t/ , ' X' )
@@ -775,6 +777,18 @@ blah \< foo \< bar \< blah \< baz
775777=> " feat ft feaeat"
776778```
777779
780+ * possessive quantifier can also be expressed using ** atomic grouping** with ` (?> ` special group
781+
782+ ``` ruby
783+ # same as: /(b|o)++/
784+ >> ' abbbc foooooot' .gsub (/(?>(b|o) +) / , ' X' )
785+ => " aXc fXt"
786+
787+ # same as: /f(a|e)*+at/
788+ >> ' feat ft feaeat' .gsub (/f(?>(a|e) *) at/ , ' X' )
789+ => " feat ft feaeat"
790+ ```
791+
778792<br >
779793
780794## <a name =" match-scan-and-globals " ></a >match, scan and globals
@@ -815,6 +829,12 @@ blah \< foo \< bar \< blah \< baz
815829
816830>> s[/ab{2,}c/ ]
817831=> " abbbc"
832+
833+ # same as: s.sub!(/b.*b/, 'X')
834+ >> s[/b.*b/ ] = ' X'
835+ => " X"
836+ >> s
837+ => " aXc"
818838```
819839
820840* ` scan ` method returns all the matched strings as an array
@@ -842,7 +862,7 @@ ABBBC
842862* global variables hold information related to matched data
843863 * as noted before, ` match? ` method won't affect these variables
844864* ` $~ ` contains ` MatchData `
845- * < code >$`</ code > contains string before the matched string
865+ * `` $` `` contains string before the matched string
846866* ` $& ` contains matched string
847867* ` $' ` contains string after the matched string
848868
@@ -948,9 +968,9 @@ ABBBC
948968* quantifiers can be applied to characters class as well
949969
950970``` ruby
951- # same as: /c(o|u)t/
952- >> ' cut cat cot coat' . gsub (/c[ou] t/ , ' X ' )
953- => " X cat X coat "
971+ # same as: /cot|cut/ or / c(o|u)t/
972+ >> %w[cute cat cot coat cost scuttle] .grep (/c[ou] t/ )
973+ => [ " cute " , " cot " , " scuttle " ]
954974
955975# same as: /(a|o)+t/
956976>> ' oat ft boa foot' .gsub (/[ao] +t/ , ' X' )
@@ -1125,6 +1145,9 @@ ba\bab
11251145# remove all punctuation characters
11261146>> ' hi there! how are you?? all fine here.' .gsub (/[[:punct:] ]+/ , ' ' )
11271147=> " hi there how are you all fine here"
1148+ # remove all punctuation characters except . and !
1149+ >> ' hi there! how are you?? all fine here.' .gsub (/[[^.!] &&[:punct:] ]+/ , ' ' )
1150+ => " hi there! how are you all fine here."
11281151```
11291152
11301153<br >
@@ -1135,7 +1158,8 @@ ba\bab
11351158* the string value that is matched by such groups can be referred outside the regexp using global variables ` $1 ` , ` $2 ` , etc
11361159* they can be referred within the regexp itself using backreferences as ` \1 ` , ` \2 ` , etc
11371160 * ` \1 ` , ` \2 ` upto ` \9 ` can be used in replacement sections of ` sub/gsub ` when block form is not needed
1138- * ` \0 ` would refer to entire matched string, equivalent to ` $& `
1161+ * ` \0 ` or ` \& ` would refer to entire matched string, equivalent to ` $& `
1162+ * `` \` `` and ` \' ` are equivalents for `` $` `` and ` $' ` respectively
11391163* * Note* that the matched string is referenced, not the regexp itself
11401164 * for ex: if ` ([0-9][a-f]) ` matches ` 3b ` , then backreferencing will be ` 3b ` not any other valid match of the regular expression like ` 8f ` , ` 0a ` etc
11411165
@@ -1218,8 +1242,8 @@ ba\bab
12181242>> ' 1,2,3,4,5,6,7' .sub (/^((?:[^,] +,) {3} )([^,] +) / , ' \1(\2)' )
12191243=> " 1,2,3,(4),5,6,7"
12201244
1221- >> ' foo:-:abc34baz25tar:-:par ' .split(/(?:\d +|:-:) / )
1222- => [" foo " , " abc " , " baz " , " tar " , " par " ]
1245+ >> ' 123hand42handy777handful500 ' .split(/hand (?:y|ful) ? / )
1246+ => [" 123 " , " 42 " , " 777 " , " 500 " ]
12231247```
12241248
12251249* but if regexp itself needs backreference, capture group cannot be avoided
@@ -1306,8 +1330,8 @@ ba\bab
13061330=> " :cat --boat ;X"
13071331
13081332# change 'foo' only if it is not followed by a digit character
1309- >> ' foo _food 1foo32 foot5' .gsub (/foo(?!\d ) / , ' baz' )
1310- => " baz _bazd 1foo32 bazt5"
1333+ >> ' hey food! foo42 foot5 foofoo ' .gsub (/foo(?!\d ) / , ' baz' )
1334+ => " hey bazd! foo42 bazt5 bazbaz "
13111335
13121336# words not surrounded by punctuation marks
13131337>> ' :cat top nice; cool. mad' .scan(/(?<![[:punct:] ]) \b\w +\b (?![[:punct:] ]) / )
@@ -1340,6 +1364,22 @@ ba\bab
13401364=> " NA,NA,1,NA,NA,2,NA,3,NA,NA"
13411365```
13421366
1367+ * lookarounds can be used to construct AND conditional
1368+
1369+ ``` ruby
1370+ >> words = %w[sequoia subtle questionable exhibit equation]
1371+ => [" sequoia" , " subtle" , " questionable" , " exhibit" , " equation" ]
1372+
1373+ # words containing 'b' and 'e' and 't' in any order
1374+ # same as: /b.*e.*t|b.*t.*e|e.*b.*t|e.*t.*b|t.*b.*e|t.*e.*b/
1375+ >> words.grep(/(?=.*b)(?=.*e) .*t/ )
1376+ => [" subtle" , " questionable" , " exhibit" ]
1377+
1378+ # words containing all vowels in any order
1379+ >> words.grep(/(?=.*a)(?=.*e)(?=.*i)(?=.*o) .*u/ )
1380+ => [" sequoia" , " questionable" , " equation" ]
1381+ ```
1382+
13431383* even though lookarounds are not part of matched string, capture groups can be used inside them
13441384
13451385``` ruby
@@ -1505,13 +1545,13 @@ SyntaxError ((irb):4: invalid pattern in look-behind: /(?<!baz.*)123/)
15051545* See also [ ruby-doc: Encoding] ( https://ruby-doc.org/core-2.5.0/Encoding.html ) for details on handling different string encodings
15061546
15071547``` ruby
1508- >> s = ' foo - baz '
1509- >> s .gsub (/\w +/n , ' (\0)' )
1548+ # example with ASCII characters only
1549+ >> ' foo - baz ' .gsub (/\w +/n , ' (\0)' )
15101550=> " (foo) - (baz)"
15111551
1512- >> s = ' foo — baz '
1513- >> s .gsub (/\w +/n , ' (\0)' )
1514- (irb):4 : warning: historical binary regexp match / .../ n against UTF - 8 string
1552+ # example with non-ASCII characters as well
1553+ >> ' foo — baz ' .gsub (/\w +/n , ' (\0)' )
1554+ (irb):2 : warning: historical binary regexp match / .../ n against UTF - 8 string
15151555=> " (foo) — (baz)"
15161556```
15171557
@@ -1535,10 +1575,10 @@ SyntaxError ((irb):4: invalid pattern in look-behind: /(?<!baz.*)123/)
15351575>> ' Cat SCatTeR CATER cAts' .scan(/Cat(?i) [a-z] *\b / )
15361576=> [" Cat" , " CatTeR" ]
15371577
1538- >> Regexp .union(/foo /i , ' bar ' )
1539- => /(?i-mx:foo) |bar /
1540- >> Regexp .union(/foo / , ' a^b' , /c.t \b /im )
1541- => /(?-mix:foo ) |a\^ b|(?mi-x:c.t \b ) /
1578+ >> Regexp .union(/^cat /i , ' 123 ' )
1579+ => /(?i-mx:^cat) |123 /
1580+ >> Regexp .union(/cat / , ' a^b' , /the.*ice /im )
1581+ => /(?-mix:cat ) |a\^ b|(?mi-x:the.*ice ) /
15421582```
15431583
15441584<br >
@@ -1565,14 +1605,20 @@ SyntaxError ((irb):4: invalid pattern in look-behind: /(?<!baz.*)123/)
15651605=> " φοοβτfoo"
15661606```
15671607
1568- * for character class ranges, use codepoints defined by ` \u{} `
1608+ * for generic Unicode character ranges, specify codepoints using ` \u{} ` construct.
15691609
15701610``` ruby
1571- >> ' hi 😆😇' .codepoints.map { |i | ' %x' % i }
1572- => [" 68" , " 69" , " 20" , " 1f606" , " 1f607" ]
1611+ # to get codepoints from string
1612+ >> ' fox:αλεπού' .codepoints.map { |i | ' %x' % i }
1613+ => [" 66" , " 6f" , " 78" , " 3a" , " 3b1" , " 3bb" , " 3b5" , " 3c0" , " 3bf" , " 3cd" ]
1614+ # one or more codepoints can be specified inside \u{}
1615+ >> puts " \u {66 6f 78 3a 3b1 3bb 3b5 3c0 3bf 3cd}"
1616+ fox: αλεπού
15731617
1574- >> puts " \u {68}\u {69}\u {20}\u {1f606}\u {1f607}"
1575- hi 😆😇
1618+ # character range example using \u{}
1619+ # all english lowercase letters
1620+ >> ' fox:αλεπού,eagle:αετός' .scan(/[\u {61}-\u {7a}] +/ )
1621+ => [" fox" , " eagle" ]
15761622```
15771623
15781624** Further Reading**
@@ -1733,11 +1779,11 @@ See the below image for illustration (courtesy [regexper](https://regexper.com/)
17331779>> puts " #{ s } apples" if s.sub!(/\d +/ ) { $& .to_i ** 2 }
1734178016 apples
17351781
1736- >> s, c = [' coffining' , 0 ]
1782+ >> word, cnt = [' coffining' , 0 ]
17371783=> [" coffining" , 0 ]
1738- >> c += 1 while s .sub!(/.in / , ' ' )
1784+ >> cnt += 1 while word .sub!(/fin / , ' ' )
17391785=> nil
1740- >> [s, c ]
1786+ >> [word, cnt ]
17411787=> [" cog" , 2 ]
17421788
17431789>> s = ' 421,foo,2425,42,5,foo,6,6,42'
@@ -1822,10 +1868,14 @@ See the below image for illustration (courtesy [regexper](https://regexper.com/)
18221868
18231869Note that most of these resources are not specific to Ruby, so use them with caution and check if they apply to Ruby's syntax and features
18241870
1871+ * An edited and expanded version of this chapter is available as [ free e-book on leanpub] ( https://leanpub.com/rubyregexp )
18251872* [ rubular] ( http://rubular.com/ ) - Ruby regular expression editor
18261873* [ stackoverflow: ruby regexp] ( https://stackoverflow.com/questions/tagged/ruby+regex?sort=votes&pageSize=15 )
1874+ * [ regexp-examples] ( https://github.com/tom-lord/regexp-examples ) - Generate strings that match a given Ruby regular expression
18271875* [ stackoverflow: regex FAQ] ( https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean )
1876+ * [ stackoverflow: regex tag] ( https://stackoverflow.com/questions/tagged/regex ) is a good source of exercise questions
18281877* [ rexegg] ( https://www.rexegg.com/ ) - comprehensive regular expression tutorials, tricks and more
1878+ * [ regular-expressions] ( https://www.regular-expressions.info/ ) - tutorials and tools
18291879* [ regexcrossword] ( https://regexcrossword.com/ ) - tutorials and puzzles
18301880* [ regexper] ( https://regexper.com/ ) - for visualization
18311881* [ swtch] ( https://swtch.com/~rsc/regexp/regexp1.html ) - stuff about regular expression implementation engines
0 commit comments