0

I've been trying to find the regex in ruby to match a php comment block:

/**
 * @file
 * lorum ipsum
 * 
 * @author  ME <me@localhost>
 * @version 00:00 00-00-0000
 */

Could anyone help I've tried searching alot and even though some regex I found has worked in a regex tester but doesn't when I write it in my ruby file.

This is the most successful bit of regex I have found:

 (/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)

This is the output from my script

file is ./test/123.rb so regex is ((^\s*#\s)+(.*?))+
i = 0
found: my first ruby comment
file is ./test/abc.php so regex is (/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)
i = 0
found: * 
i = 1
found: *

Here is the code I have to do this:

 56   def self.extract_comments f
 57     if @regex[File.extname(f)]
 58       puts "file is " + f + " so regex is " + @regex[File.extname(f)]
 59       cur_rgx = Regexp.new @regex[File.extname(f)]
 60       matches = IO.read( f ).scan( cur_rgx )
 61       content = ""
 62       if ! matches.empty?
 63         # content = "== " + f + " ==\n"
 64         content += f + "\n"
 65         for i in 0...f.length
 66           content += "="
 67         end
 68         content += "\n"
 69         for i in 0...matches.length
 70           puts "i = " + i.to_s
 71           puts "found: " + matches[i][2].to_s
 72           content << matches[i][2].to_s + "\n"
 73         end
 74         content << "\n"
 75       end
 76     end
 77     content || '' # return something
 78   end
5
  • Is there a problem with this regex?, if yes please expatiate. Commented Oct 19, 2012 at 11:02
  • The one thing that I found interesting in your question is that the regex works in a regex tester and not in Ruby code. It would be useful to have more information about the exact problem you have and the Ruby code you're using. Commented Oct 19, 2012 at 11:08
  • Hi, I've update the post to reflect your comment Commented Oct 19, 2012 at 11:22
  • I think the problem is not the regex, but the way you're handling the matches array. The entire comment should be the first group that's matched; have you tried inspecting matches[0][0]? Commented Oct 19, 2012 at 11:56
  • thanks @AlbertoMoriconi the value I was looking for was stored in [i][0] and my original regex was fine Commented Oct 19, 2012 at 14:22

2 Answers 2

1

It seems like /\/\*.*?\*\//m should do. Also that's really a c-style comment block.

Sign up to request clarification or add additional context in comments.

Comments

0

Unless it is important that each line inside the comment block begins with an asterisk, you may want to try this regex:

/\/\*(?:[^*]+|\*+(?!\/))*\*\//

EDIT: And here's a stricter version, which will only match comments that are formatted exactly like your example:

/^( *)\/\*\*\n(?:\1 \*(?:[^*\n]|\*(?!\/))*\n)+\1 \*\//

This version will only match a comment that has /** and */ on separate lines. /** can be indented by an arbitrary number of spaces (but no other white-space characters), but the other lines must be indented by exactly one more space than the /** line.

EDIT 2: Here is another version:

/^([ \t]*)\/\*\*.*?\n(?:^\1 .*?\n)+^\1 \*\//

It allows a mixture of tabs and spaces (ew) for indentation, but still requires all lines to conform to the indentation of the /** one (plus a single space).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.