2

How can I generate string by given regex in ruby?

Class MyRegex
   def self.generate_string(reg)
    #return generated string.
   end
end

When I call

MyRegex.generate_string(/a*/) #it will return random string.

expecting output:

aaa
aa
aaaaa

and so on

2
  • 6
    This might help: github.com/repeatedly/ruby-string-random Commented Mar 7, 2014 at 13:28
  • Ugly solution rand(10).times.map { |a| 'a' }.join ... 10 can be any number... :) Commented Mar 7, 2014 at 13:34

3 Answers 3

3

A bit late to the party, but I have created a powerful ruby gem which solves the original problem:

https://github.com/tom-lord/regexp-examples

/this|is|awesome/.examples #=> ['this', 'is', 'awesome']
/https?:\/\/(www\.)?github\.com/.examples #=> ['http://github.com', 'http://www.github.com', 'https://github.com', 'https://www.github.com']
Sign up to request clarification or add additional context in comments.

Comments

1

The short answer is that you can't, as some of the strings could be infinite if you allow the *, + or repetitions which are open on the right eg. {4,}.

If you want to do it any way, then you have two strategies, both of which starts with parsing the regex, and building a state machine representing it.

Then you can either generate a random run through it of max length 'n'. This will give you a random string of at most length n. Or you can add an empty transition to all the states in your state machine to a terminal state, and simply do a random walk until you hit a terminal state. This will give you a completely random string, which the regex accepts, where the length has an arbitrary length, but longer strings are less probable. But please not that there is still an, albeit very very small, chance that this method will never end, as the string size grows, then the probability of outputting a new character falls, but just as the string length never hits infinite, neither does the probability of a new character hit zero.

That is almost exactly what the code posted in the comments by @neil-slater https://github.com/repeatedly/ruby-string-random

Edit

The OP asks if it is possible to generate a random string, which a given regular expression matches.

A regular expression is a string representation of a regular language. A finite automaton is a decider which encodes a given regular language, and can determine if a given string is part of that regular language. So basically the way regular expression matching works, is by compiling the regular expression to a finite automaton, and use that to see if it accepts the string. That's matching.

Now lets look at generation. You can use the same finite automaton to generate strings, however as a finite automata, as @sawa correctly pointed out, only works on finite strings, then you have to make sure that you only generate a finite string. One way of doing this is randomly deciding a maximum length, and then do a random walk of at most that length in the fintite automaton. One way of not doing this is the way both @sawa and I suggested of taking a transition with some probability, or simply stopping. As this potentially doesn't terminate, because the product of any non-zero probabilities, only approaches zero, but new reaches it.

8 Comments

A matching string can be of an arbitrary length for some regexes, but cannot be infinite. And even if such thing as "infinite strings" existed, it is not clear how that leads to the conclusion that you cannot. You have several logical flaws.
No, it cannot be infinite. It is not even clear how you can match a given random string of an infinite length against a regular expression (and lead to a halt). The physical realization is irrelevant. I am surprised that there are so many people on this site like you, who should know more or less about programming, but cannot even distinguish something of an arbitrary length and something of an infinite length.
Then how does it know whether a string matches or not?
@sawa The question is about generating, not matching, strings. If you have to generate a string from the regexp a+, how does the generator know when to stop?
@jbr: This debate hinges on the definition of "random" in OP. If you take it to be some equally-likely distribution from all possible matching strings, then it is not really feasible, because they cannot be enumerated in order to make a random choice. But OP did not ask for that. A generative grammar with limits on how it processes open-ended matches should be possible to create from most regex. Although it is possible to create regex that match nothing using look-ahead or look-behind, and those might be tricky
|
0

This answer is not intended to fully answer you question. Its purpose is twofold: (i) to show that it is not impossible, so that jbr's answer is entirely wrong, and (ii) to suggest you that, nevertheless, it is not trivial, and you have to work out the complete code by yourself.

Since to fully answer your question is would probably not fit the space for a single answer in a Q and A site like this, I will show a code that generates all the possible strings that matches a fixed regex:

/a*/

The code is like this:

class Regexp
  def self.generate_string
    rand > 0.5 ? "" : "a#{generate_string}"
  end
end

Each time you run Regexp.generate_string, a random string that matches /a*/ will be generated. The string would be of an arbitrary length, and the longer the string is, it will be generated with less possibility.

7 Comments

This method does not guarantee that it stops. It could, albeit with very very low probability, never stop. As the product of all the probabilities goes towards infinity, then the probability goes towards zero, but it never hits zero, thus this method has a non-zero probability of never halting.
You cannot prove that it will never stop. Or, if you say you have a proof, then show it.
@jbr: With Ruby's default rand() it provably stops, because there is guaranteed to be a value below 0.5 within a certain number of steps
@NeilSlater that is true it will work in the real world, as 0.5 * 0.5 * ..., eventually yield zero when using floating points. But the "algorithm" still has a non-zero probability that it doesn't terminate, and thus the method is not correct.
@NeilSlater, again I totally agree :-) I'd still just go with the simple solution and predetermine a maximum length and take at most that many step in my random walk. As that would assure a correct result, without relying on the limitations of float and PRNGs.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.