EDIT: I have no access to "replace" function, to any code, or to the REGEX matches. All I can do is provide a regex string to the API, and it strips out whatever was matched (not part of an email), and leaves the rest (leaving behind only emails).
I am working with an API that reads data from an OCR document. I have no control over the API, however I have access to a function in the API which can strip out whatever is matched by a provided REGEX. I am trying to strip out whatever is NOT an email address, leaving only the email addresses behind, separated by spaces if there is more than one email. I know REGEX isn't the best for matching emails, but I have no other choice here.
Thanks to the OCR document, there are often characters that should not be present in an email e.g the text could be (simple example) User Email:[email protected]*required field and I would like to end up with just [email protected] by stripping out the rest.
- I can't define or use regex replace or any other functions. All I can do is define a regex for what to strip off (basically I need to invert an email match).
- I certainly don't expect this to work for all RFC-compliant email addresses, just reasonably most use-cases.
- In case it matters, I happen to know the architecture of the API is in C#
Here is what I tried (non-working) to use to invert the email match, but it doesn't match anything.
^(?![A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}(?!.)
I also searched SO and found this link but it was inconclusive.
([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4})|.=>$1(or\1), or$1\n(\1\n).