How do you capture part of a regex to a variable in Ruby?

Question

I know about "string"[/regex/], which returns the part of the string that matches. But what if I want to return only the captured part(s) of a string?

I have the string "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3". I want to store in the variable title the text The_Case_of_the_Gold_Ring.

I can capture this part with the regex /\d_(?!.*\d_)(.*).mp3$/i. But writing the Ruby "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"[/\d_(?!.*\d_)(.*).mp3$/i] returns 0_The_Case_of_the_Gold_Ring.mp3 which isn't what I want.

I can get what I want by writing

"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(.*).mp3$/i
title = $~.captures[0]

But this seems sloppy. Surely there's a proper way to do this?

(I'm aware that someone can probably write a simpler regex to target the text I want that lets the "string"[/regex/] method work, but this is just an example to illustrate the problem, the specific regex isn't the issue.)

Roman Kiselenko · Accepted Answer · 2014-11-12 08:25:52Z

5

You can pass number of part to [/regexp/, index] method:

=> string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 1]
=> "The_Case_of_the_Gold_Ring"
=> string[/\d_(?!.*\d_)(.*).mp3$/i, 0]
=> "0_The_Case_of_the_Gold_Ring.mp3"

answered Nov 12, 2014 at 8:25

Roman Kiselenko

44.5k9 gold badges100 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

spickermann · Accepted Answer · 2014-11-12 07:47:17Z

2

Have a look at the match method:

string = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
regexp = /\d_(?!.*\d_)(.*).mp3$/i

matches = regexp.match(string)
matches[1]
#=> "The_Case_of_the_Gold_Ring"

Where matches[0] would return the whole match and matches[1] (and following) returns all subcaptures:

matches.to_a    
#=> ["0_The_Case_of_the_Gold_Ring.mp3", "The_Case_of_the_Gold_Ring"]

Read more examples: http://ruby-doc.org/core-2.1.4/MatchData.html#method-i-5B-5D

answered Nov 12, 2014 at 7:47

spickermann

108k9 gold badges115 silver badges147 bronze badges

Comments

Aurélien Bottazini · Accepted Answer · 2014-11-12 07:48:06Z

1

You can use named captures

"1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" =~ /\d_(?!.*\d_)(?<title>.*).mp3$/i

and $~[:title] will give you want you want

answered Nov 12, 2014 at 7:48

Aurélien Bottazini

3,30921 silver badges26 bronze badges

Comments

the Tin Man · Accepted Answer · 2014-11-12 19:00:05Z

Meditate on this:

Here's the source string to be parsed:

str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"

Patterns can be defined as strings:

DATE_REGEX = '\d{4}-[A-Z]{3}-\d{2}'
SERIAL_REGEX = '\d{2}'
TITLE_REGEX = '.+'

Then interpolated into a regexp:

regex = /^(#{ DATE_REGEX })_(#{ SERIAL_REGEX })_(#{ TITLE_REGEX })/
# => /^(\d{4}-[A-Z]{3}-\d{2})_(\d{2})_(.+)/

The advantage to that is it's easier to maintain because the pattern is really several smaller ones.

str.match(regex) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">
regex.match(str) # => #<MatchData "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3" 1:"1952-FEB-21" 2:"70" 3:"The_Case_of_the_Gold_Ring.mp3">

are equivalent because both Regexp and String implement match.

We can retrieve what was captured as an array:

regex.match(str).captures # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]
regex.match(str).captures.last # => "The_Case_of_the_Gold_Ring.mp3"

We can also name the captures and access them like we would a hash:

regex = /^(?<date>#{ DATE_REGEX })_(?<serial>#{ SERIAL_REGEX })_(?<title>#{ TITLE_REGEX })/
matches = regex.match(str)
matches[:date] # => "1952-FEB-21"
matches[:serial] # => "70"
matches[:title] # => "The_Case_of_the_Gold_Ring.mp3"

Of course, it's not necessary to mess with that rigamarole at all. We can split the string on underscores ('_'):

str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"
str.split('_') # => ["1952-FEB-21", "70", "The", "Case", "of", "the", "Gold", "Ring.mp3"]

split can take a limit parameter saying how many times it should split the string. Passing in 3 gives us:

str.split('_', 3) # => ["1952-FEB-21", "70", "The_Case_of_the_Gold_Ring.mp3"]

Grabbing the last element returns:

str.split('_', 3).last # => "The_Case_of_the_Gold_Ring.mp3"

Cary Swoveland · Accepted Answer · 2014-11-15 05:05:36Z

I believe it would be easiest to use a capture group here, but I'd like to present some possibilities that do not, for illustrative purposes. All employ the same positive lookahead ((?=\.mp3$)). all but one use a positive lookbehind and one uses \K to "forget" the match up to the last character before beginning of the desired match. Some permit the matched string to contain digits (.+); others do not ([^\d]).

str = "1952-FEB-21_70_The_Case_of_the_Gold_Ring.mp3"

1 # match follows last digit followed by underscore, cannot contain digits 
str[/(?<=\d_)[^\d]+(?=\.mp3$)/]    
  #=> "The_Case_of_the_Gold_Ring"

2 # same as 1, as `\K` disregards match to that point
str[/\d_\K[^\d]+(?=\.mp3$)/]
  #=> "The_Case_of_the_Gold_Ring"

3 # match follows underscore, two digits, underscore, may contain digits 
str[/(?<=_\d\d_).+(?=\.mp3$)/]
  #=> "The_Case_of_the_Gold_Ring"

4 # match follows string having specfic pattern, may contain digits
str[/(?<=\d{4}-[A-Z]{3}-\d{2}_\d{2}_).+(?=\.mp3$)/]
  #=> "The_Case_of_the_Gold_Ring"

5 # match follows digit, any 12 characters, another digit and underscore,
  # may contain digits
str[/(?<=\d.{12}\d_).+(?=\.mp3$)/]
  #=> "The_Case_of_the_Gold_Ring"

Collectives™ on Stack Overflow

How do you capture part of a regex to a variable in Ruby?

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related