0

Below is the sample words that I will use at time of file import

  • East Chesterton (Cambridge)
  • New york (USA)
  • child (parent)

So here are the business rules:

  1. First word should be at least 3 chars long (l.e child)
  2. Allow space , but it's invalid if there's only space (l.e East Chesterton)
  3. The other part of the word is in ( someword )
  4. The ( someword ) is optional
  5. If ( someword ) is there it's minimum length is of 3, and spaces are also allowed.

I have achieved this at some level using following expression:

^[a-zA-Z ]{1,}\([a-zA-Z ]{1,}\)$
  1. Now i want to make sure this is correct expression. Is there any way to check with automation to check multiple combination to verify my expression?

  2. How i can achieve optional part (point no. 4), It min whether i pass (somedata) or not that check for first part.

Also to extract data in '( )'

\((.*?)\)
4
  • 2
    test it here regex101.com Commented Dec 31, 2015 at 13:28
  • 1
    {1,} matches a single occurrence or more, but you said you need at least 3 characters, so that should be {3,}. For optional parts, use ?. Note that (...) indicates a group. To match parentheses, you need to escape them: \( and \). Also note that you can use character classes such as \w (word characters) and \s (whitespace characters) instead of explicit ranges. You can also allow (optional) whitespace in-between the first and the second part with \s* (0 or more whitespace characters). Commented Dec 31, 2015 at 13:34
  • Although you can use regex101 for this task, mind that that site does not support .NET regex syntax. Use regexhero.net or regexstorm.net. Now, your requirements are not that clear: what is the min. length of 3? The first word or all the subparts? To test the regex, you should think of possible input string types yourself, there is no way to create test strings automatically. Content drives regex, not vice versa. Try this regex: ^(?=[a-z]{3,})[a-z ]+(?:\p{Zs}\((?=[a-z]{3,})[a-z ]+\))?\r?$ Commented Dec 31, 2015 at 13:48
  • One change to @stribizhev 's regex: the requirement is to have minimum length of 3 (in two places). (?=[a-z]{3,}) does that, but it is also followed up with [a-z ]+, effectively making it a minimum of 4. The second should use * instead of +. Commented Dec 31, 2015 at 15:59

2 Answers 2

2

I think you are almost there. I did a try. Does this comply to all your requirements?

^[a-zA-Z\s]{3,}(\([a-zA-Z\s]{3,}\))?$

https://regex101.com/r/yE9lB0/2

I made the second part optional by putting it in between parenthesis and adding a question mark at the end: (myoptionalexpression)?

Sign up to request clarification or add additional context in comments.

Comments

0

I've taken a look at the answer posted here.

 ^[a-zA-Z\s]{3,}(\([a-zA-Z\s]{3,}\))?$

This would clash with

  • Allow space , but it's invalid if there's only space (l.i East Chesterton)

Only empty spaces would already match.

Besides that the description 'characters' might be a bit vague. I have therefore assumed word characters \w are what you mean. (in C sharp \w should include unicode characters like ü as well. Think of Münster (Germany) as example.

The new regex would look like this:

^\s*(?:\w{3,}\s*)+(?:\(\s*(?:\w{3,}\s*)+\))?\s*$

Examples here: https://regex101.com/r/gS7kG8/3

Note that the regex101 page works with php,python and js regex, it will not give exact results in case of C# (\w in php apparently doesn't match unicode for instance)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.