2

I am trying to write one regular expression for string. Let us say there is a string RBY_YBR where _ represents empty so we can recursively replace the alphabets and _ and the result is RRBBYY_ . There can be two or more alphabet pairs can be formed or something like this also RRR .

Conditions
1). Left or right alphabet should be the same.
2). If there is no _ then the alphabet should be like RRBBYY not RBRBYY or RBYRBY etc.
3). There can be more than one underscore _ .
From regular expression I am trying to find whether the given string can satisfy the regular expression or not by replacing the character with _ to form a pattern of consecutive alphabets
The regular expression which I wrote is

String regEx = "[A-ZA-Z_]";

But this regular expression is failing for RBRB. since there is no empty space to replace the characters and RBRB is also not in a pattern.
How could I write the effective regular expression to solve this.

14
  • You regex looks strange. It seems to match only one character. Did you try "[A-Z_]+" ? Still this will not solve "left or right alphabet should be the same" (whatever that means) Commented Oct 11, 2016 at 14:00
  • @FlorianAlbrecht Means after recursive replacing of alphabet and _ in a string the final result should be like RRBBYY_ . So I am just trying to write one regex which can satisfy that yes a pattern like RRBBYY_ can be formed Commented Oct 11, 2016 at 14:03
  • "Alphabet" is what confuses me here. Do you mean "a character in the range A-Z"? Could you please provide a String only consisting of characters and underscores which shall NOT match your regex? Commented Oct 11, 2016 at 14:05
  • Oh, or do you mean that the count of every character used must be even? Commented Oct 11, 2016 at 14:07
  • @FlorianAlbrecht yea by saying alphabet I mean a character. like RBRBRB is a string and there is no _ for replacing string with _ . So it can not form into RRRBBB. My intention here is to write one regex which can satisfy that a pattern of consecutive characters can be form or not Commented Oct 11, 2016 at 14:08

2 Answers 2

1

Ok, as I understand it, a matching string shall either consist only of same characters being grouped together, or must contain at least one underscore.

So, RRRBBR would be invalid, while RRRRBB, RRRBBR_, and RRRBB_R_ would all be valid.

After comment of question creator, additional condition: Every character must occur 0 or 2 or more times.

As far as I know, this is not possible with Regular Expressions, as Regular Expressions are finite-state machines without "storage". You would have to "store" each character found in the string to check that it won't appear later again.

I would suggest a very simple method for verifying such strings:

public static boolean matchesMyPattern(String s) {
    boolean withUnderscore = s.contains("_");

    int[] found = new int[26];

    for (int i = 0; i < s.length(); i++) {
        char ch = s.charAt(i);
        if (ch != '_' && (ch < 'A' || ch > 'Z')) {
            return false;
        }

        if (ch != '_' && i > 0 && s.charAt(i - 1) != ch && found[ch - 'A'] > 0
                && !withUnderscore) {
            return false;
        }
        if (ch != '_') {
            found[ch - 'A']++;
        }
    }

    for (int i = 0; i < found.length; i++) {
        if (found[i] == 1) {
            return false;
        }
    }

    return true;
}
Sign up to request clarification or add additional context in comments.

9 Comments

This will fail for this test condition x_y__x even though we can move to find a pattern like xxy__ but here y is not forming a pair. Atleast a pair should be form
Hm? This will return true for x_y__x. This is not what you want? Every character must occur 0 or at least 2 times?
yea at least a pair of consecutive character should be there. May be more than 2 like RRR or RRRBBB etc.
Yea every character in a given string must be occurred at least two times
No still failed
|
0

Please take my answer with a grain of salt, since it's a bit of a "Fastest gun in the West" post.

It follows the same assumptions as Florian Albrecht's answer. (thanks)

I believe that this will solve your problem:

(([A-Za-z])(\2|_)+)+

https://regex101.com/r/7TfSVc/1

It works by using the second capturing group and ensuring that more of it follow, or there are underscores.

Known bug: it does not work if an underscore starts a string.

EDIT

This one is better, though I forgot what I was doing by the end of it.

(([A-Za-z_])(\2|_)+|_+[A-Za-z]_*)+

https://regex101.com/r/7TfSVc/4

7 Comments

Good one but this will faild for this test case __ .
Which case did it fail? I'd like to fix it for you, and improve my skills
It fail for RRBB and also for __
Under RRBB there is not empty space so there should be no movement but RRBB for a consecutive pair of character. so it should print successful and for __ is itself a pair
wait, so why should RRBBYY match, but RRBB shouldn't? (I might have misunderstood you here, sorry)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.