1

Hey I have the following strings as input:

"abcol"  
"ab_col"  
"cold"  
"col_ab"  
"col.ab"  

I have the string col to search from. I'm using regex to match

Match matchResults = Regex.Match(input , "col", RegexOptions.IgnoreCase);

I want to match only the string that has this pattern [Any special character or nothing ] + col + [Any special character or nothing]

From the above inputs, I want to return only ab_col, col_ab , col.ab

Any help is highly appreciated.
Thanks

[Any special character] = [^A-Za-z0-9]

4
  • What defines a "special character"? Commented Dec 3, 2012 at 19:37
  • Not an alpha numeric character [^A-Za-z0-9] (anything that's not an alphabet or number) Commented Dec 3, 2012 at 19:38
  • 1
    You have not specified what you have tried, and neither are you asking a proper question, but rather saying "solve this for me". I understand that this can be tricky to get right, but this question could probably be solved if you searched around the Internet before asking. Commented Dec 3, 2012 at 19:43
  • Using [^A-Za-z0-9] I was able to get partially what I want. I'm familiar with Regex and I tried searching the internet. I know I have to use the "|" but not sure exactly how. That's why I posted here. Thanks for your feedback. Commented Dec 3, 2012 at 19:54

2 Answers 2

5

You can use this regex: -

(?:^.*[^a-zA-Z0-9]|^)col(?:[^a-zA-Z0-9].*$|$)

Explanation : -

(?:   // non-capturing
  ^   // match at start of the string
  .*[^a-zA-Z0-9]  // match anything followed by a non-alphanumeric before `col`
    |     // or
  ^       // match the start itself (means nothing before col)
)
  col  // match col
(?:   // non-capturing
  [^a-zA-Z0-9].*  // match a non-alphanumeric after `col` followed by anything
   $     // match end of string
   |     // or
   $     // just match the end itself (nothing after col)
)
Sign up to request clarification or add additional context in comments.

Comments

2

@"(^|.*[\W_])col([\W_].*|$)" this is your pattern. \w is alphanumeric character and \W is non alphanumeric character. ^ means line start and $ means line end. | is the or. so (^|.*\W) means line start or some characters and non alphanumeric after them.

EDIT:

yes, underline is alphanumeric too... so you should write [\W_] (non alphanumeric or underline) instead of \W

4 Comments

This will not satisfy the requirement that "ab_col" be a match or "col_ab", because '_' is included in the '\w' class.
Just ran your edit again, still fails. It will match "ab.col", but not "ab_col". Its due to the use of the '\W' class which says "don't match '_'"
\W skips underscores as well which I don't want
@user1178492 - If you group it (which is what I added when I edited the answer) so that instead of \W you have [\W_] it will match the underscore. I like this one over [^a-zA-Z-0-9] just because it is more compact, but some may find the full listing more readable in the long run.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.