3

Could you please help me in writing a pure regex expression to find the first letter that doesn't repeat in a string? I thought I might possibly need to use negative lookahead and negative lookbehind, but I don't think javascript supports lookbehind.

For e.g.

'NUNUMUNN'        // expected output 'M'
'LOMING'          // expected output 'L'

I think it's possible to do this using general string operations, but my preference is really for pure regex.

My starting point is currently this:

/(a-zA-Z).*?(?!\1)/.match('thestring');

But it doesn't work.

1

1 Answer 1

3

Turn your logic around: First match all the letters in a word that do repeat, then match the next letter - that's the one you need to look at. Then there are some edge cases to consider.

/\b(?:(?:([a-z])(?=[a-z]*\1))+(?!\1))?([a-z])(?![a-z]*\2)/ig

Explanation:

\b          # Start of word
(?:         # Start of non-capturing group (optional, see below)
 (?:        # Start of non-capturing group that matches...
  ([a-z])   # ...and captures any ASCII letter
  (?=       # if it's followed by
   [a-z]*   #  zero or more letters 
   \1       #  and the same letter again.
  )         # (end of lookahead assertion)
 )+         # Repeat at least once
 (?!\1)     # Assert that this letter doesn't follow immediately to avoid matching "AAA"
)?          # Make that group optional (in case there are no repeated letters in the word)
([a-z])     # Then match a letter and capture it in group 2.
(?![a-z]\2) # and make sure that letter doesn't immediately repeat either.

Note that you will need to look at group 2 of a match in order to get the result - group 1 will contain whatever comes before the first non-repeating letter.

Test it live on regex101.com.

Sign up to request clarification or add additional context in comments.

6 Comments

Hmm. When testing this on regex101.com/r/dR3jQ6/1, it works with the default PCRE flavor but fails in the JavaScript tag. Same thing with RegexBuddy; apparently JS treats the empty backreference within a lookahead assertion differently...need to look at this a bit more...
This does not seem to work for me. "ZAZ".match(r) is giving me ["ZA"], what am I missing?
@torazaburo: The regex matches all the repeating letters (A in this case) and then the first non-repeating letter (Z). Look at group 2 for the actual result. However, there are other issues with this regex that I'm addressing at the moment - will post an edit soon.
Oh my god, I could never have come up with this! Thank you so much! ^^ Is there any place you'd recommend for me to go to for learning regex at this level?
edit: It doesn't seem to return 'A' from the string 'XXXXXA', and it does return 'A' from the string 'AXXXXA'?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.