20

Duplicate:

Random string that matches a regexp

No, it isn't. I'm looking for an easy and universal method, one that I could actually implement. That's far more difficult than randomly generating passwords.


I want to create an application that takes a regular expression, and shows 10 randomly generated strings that match that expression. It's supposed to help people better understand their regexps, and to decide i.e. if they're secure enough for validation purposes. Does anyone know of an easy way to do that?

One obvious solution would be to write (or steal) a regexp parser, but that seems really over my head.

I repeat, I'm looking for an easy and universal way to do that.

Edit: Brute force approach is out of the question. Assuming the random strings would just be [a-z0-9]{10} and 1 million iterations per second, it would take 65 years to iterate trough the space of all 10-char strings.

5
  • I don't think there's going to be an easy way to do this... maybe the mechanical turk? :) Commented Apr 14, 2009 at 16:12
  • Do you have a particular regex in mind, or are you after a general solution for any regex variant? Because you're not going to find one that works for Perl as well as .NET unless you restrict yourself to truly regular expressions without any extensions. Commented Apr 14, 2009 at 16:24
  • Well, I would like a general solution for a single variant, most notably the one I use, Perl Regular Expressions implementation in PHP. Commented Apr 14, 2009 at 17:23
  • In general, the problem is #P-hard. researchgate.net/publication/… Commented Dec 13, 2015 at 18:49
  • See also Given a regular expression, how would I generate all strings that match it? Commented Apr 12, 2017 at 18:58

3 Answers 3

25

Parse your regular expression into a DFA, then traverse your DFA randomly until you end up in an accepting state, outputting a character for each transition. Each walk will yield a new string that matches the expression.

This doesn't work for "regular" expressions that aren't really regular, though, such as expressions with backreferences. It depends on what kind of expression you're after.

Sign up to request clarification or add additional context in comments.

12 Comments

@Richard E: Deterministic finite automaton
@Richard E: Deterministic Finite Automata: en.wikipedia.org/wiki/Deterministic_finite_state_machine Basically it's the implementation of a regular expression. When you compile a regex, a DFA is the result.
@Richard E., deterministic finite automata?
@DFA: If you end up in a non-accepting branch of the DFA from which no transitions end in accepting states, then you'll have to start over. Obviously if such a branch exists it would have to be trimmed out of the set of states somehow. It should be simple enough to use graph algorithms to find them.
@Pies: This is how regular expressions work. Even if you find a library that does it for you, this is probably how it works. It does exactly what you need of it: traverse the structure the regex represents, but in reverse; producing a string rather than consuming one.
|
7

Take a look at Perl's String::Random.

7 Comments

I don't suppose you know a similar thing for PHP?
Write it in Perl, compile it with some Perl-to-executable tool, then invoke it from PHP.
The internet is a series of tubes.
Yeah, I guess it's just easier to deploy if you use a single language :)
Perl's String::Random only supports a small subset of regexp, so I'll have to look for something better.
|
0

One rather ugly solution that may or may not be practical is to leverage an existing regex diagnostics option. Some regex libraries have the ability to figure out where the regex failed to match. In this case, you could use what is in effect a form of brute force, but using one character at a time and trying to get longer (and further-matching) strings until you got a full match. This is a very ugly solution. However, unlike a standard brute force solution, it failure on a string like ab will also tell you whether there exists a string ab.* which will match (if not, stop and try ac. If so, try a longer string). This is probably not feasible with all regex libraries.

On the bright side, this kind of solution is probably pretty cool from a teaching perspective. In practice it's probably similar in effect to a dfa solution, but without the requirement to think about dfas.

Note that you won't want to use random strings with this technique. However, you can use random characters to start with if you keep track of what you've tested in a tree, so the effect is the same.

1 Comment

Interesting idea, I'll check it out.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.