2

Suppose I have an arbitrary regular expression. How I could calculate the length of string required for a match?

Examples (regex => minimum length of matchable string):

  1. [0-9]{3},[0-9]{2} => 6
  2. [0-9]{4},[0-9]{2} => 7
  3. [0-9]{2}.[0-9]{3}.[0-9]{3}/[0-9]{4}-[0-9]{2} => 17
  4. [0-9]{3}.[0-9]{3}.[0-9]{3}-[0-9]{2} => 14
  5. [0-9]{2}/[A-Z]{2}/[0-9]{4} => 10

I also need a function which take as parameter a regex and a integer number between 1 and the size calculated with the function above (like position(regex, number)), and return what the type of the character in that position (number, letter or symbol).

Examples:

  • Example 1: Position 3 is a "number"
  • Example 2: Position 3 is a "symbol"
  • Example 5: Position 4 is a "letter"

UPDATE

the objective here is implement this:

function size_of(regex) {
    //
}

function type_of(regex, posicao) {
    //
}

function generate_string(tamanho) {
    //
}

$(document).on('.valida', 'focus', function(){
    var regex = $(this).attr('pattern');

    var counter = 0;
    var tam = size_of(regex);
    var str = generate_string(tam);

    $(this).val(str);
    $(this).keypress(function(event){
        var tecla = e.which;

        if(typeof tecla == type_of(regex, counter)){
            str = str + tecla;
            counter++;
        }

        $(this).val(str);
    });
});

UPDATE 2

some examples that would be useful:

1-> calculate the lengh: http://js.do/code/38693 (just need be more generic).

UPDATE 3 - FINAL CODE

the final code for the script above is that:

jsfiddle

http://jsfiddle.net/klebermo/f8U4c/78/

code

function parse(regexString){
    var regex = /((?!\[|\{).(?!\]|\}))|(?:\[([^\]]+)\]\{(\d+)\})/g,
        match,
        model = [];
    while (match = regex.exec(regexString)) {
        if(typeof match[1] == 'undefined'){
            for(var i=0;i<match[3];i++){
                model.push(match[2]);
            }
        }else{
            model.push(match[1]);
        }
    }
    return model;
}

function replaceAt(s, n, t) {
    return s.substring(0, n) + t + s.substring(n + 1);
}

function size_of(regex) {
    var parsedRegexp = parse(regex);
    return parsedRegexp.length;
}

function type_of(regex, posicao) {
    var parsedRegexp = parse(regex);
    var pos = parsedRegexp[posicao];

    if(pos == '0-9')
        return 'number';

    if(pos == 'A-Z' || pos == 'a-z')
        return 'string';

    return pos;
}

function generate_string(regex, tamanho) {
    var str = '';

    for(var i=0; i<tamanho; i++) {
        var type = type_of(regex, i);
        if(type == 'number' || type == 'string')
            str = str + '_';
        else
            str = str + type;
    }

    return str;
}

var counter;
var tam;
var str;
var regex;

$('.valida').each(function(){

    $(this).on('focus', function(e){
        regex = $(this).attr('pattern');

        counter = 0;
        tam = size_of(regex);
        str = generate_string(regex, tam);

        $(this).val(str);
    });

    $(this).on('keypress', function(e){
        e.preventDefault();

        var tecla = e.which;

        if(tecla >= 48 && tecla <= 57)
            var tecla2 = tecla - 48;
        else
            var tecla2 = String.fromCharCode(tecla);

        result = $("<div>");
        result.append( "tecla = "+tecla+"<br>" );

        var t = type_of(regex, counter);

        if(counter < tam) {
            if(t != 'number' && t != 'string') {
                str = replaceAt(str, counter, t);
                counter++;
            }

            t = type_of(regex, counter);

            if(typeof tecla2 == t) {
                result.append( "tecla2 = "+tecla2+"<br>" );
                str = replaceAt(str, counter, tecla2);
                counter++;
            }
        }

        result.append( "counter = "+counter+"<br>" );
        $("#result").empty().append(result);

        $(this).val(str);
    });

});
14
  • 1
    I failed to understand where the length value comes from Commented Jun 1, 2014 at 1:31
  • I want calculate that based on the pattern determined by the regex. Commented Jun 1, 2014 at 1:35
  • 1
    So basically you want a regex parser that parses the pattern itself, and generates information about it … What’s the actual use case here? Commented Jun 1, 2014 at 1:42
  • 1
    I think it would be easier to get the length of the string tested using the regex rather than calculate the length of the regex. Commented Jun 1, 2014 at 1:44
  • 1
    Maybe I understood this incorrectly but if you have the regexes beforehand, wouldn't it be easier to just manually map each regex to the expected string length unless there is an excessive number of regexes to keep track of? Commented Jun 1, 2014 at 1:47

3 Answers 3

3

I've made a little parser for simple regex like the ones you're using.

It basically creates an array for each expected character with the type of character (0-9, A-Z) or the character itself.

function parse(regexString){
    var regex = /((?!\[|\{).(?!\]|\}))|(?:\[([^\]]+)\]\{(\d+)\})/g,
        match,
        model = [];
    while (match = regex.exec(regexString)) {
        if(typeof match[1] == 'undefined'){
            for(var i=0;i<match[3];i++){
                model.push(match[2]);
            }
        }else{
            model.push(match[1]);
        }
    }
    return model;
}

And jsfiddle to demo.

About the regex used inside the parse method, a debuggex schema will explain it better than i could do :

((?!\[|\{).(?!\]|\}))|(?:\[([^\]]+)\]\{(\d+)\})

Regular expression visualization

Also, you can get total number of characters through :

myresult.length;

And the type of the n-th character through :

myresult[n];
Sign up to request clarification or add additional context in comments.

1 Comment

+1 nice answer. This should be chosen as the solution.
1

I believe a generic solution to this problem would involve implementing a function that generates a finite state automaton object corresponding to each regex.

This SO post seems related to the question at hand.

Also check out this link: (C# Code to generate strings that match a regex)

4 Comments

I guess generate a random string will help to acomplish what I want. But in the link you indicate me all the examples are in php or perl (which I don't know almost anything). Is there any way to do that in jquery or javascript?
Could you post a list of all the regexes you need to match on? If there is a common pattern among them, that can be used to write a solution specifically for your use case
I don't have a common pattern, the script should be the more generic possible. For implement this, I just need to know how to do what I ask in the question (length of string and type of character). When I do this, I can post here the final code to evaluation.
this example would be a good start point, junt need be more generic: js.do/code/38693
1

In your specific case, you could try this code (demo):

var basicRegexLength = function(regex){
    var i;
    regex = regex.replace(/(\[0-9\]|\[A-Z\])/gi, '');
    for (i = 1; i < 10; i++) {
        regex = regex.replace( new RegExp('\\{' + i + '\\}', 'g'), Array(i+1).join('.') );
    }
    return regex.length;
};

2 Comments

just one question: why you choose the limit as 10 in the loop for?
It looks for repeated values from {1} to {10}... if you need more or less, adjust it as desired.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.