2

I am trying to clean public company names by removing certain character patterns from the end of the name. Sometimes a company will look like this: Some-Random Company Incorporated Inc. I get rid of special characters and and instances of Inc and Incorporated that appear at the end of the name:

MDS_DB.mdq.RegexReplace(MDS_DB.mdq.RegexReplace(COMPANY,
    '[^a-zA-Z0-9 ]', '', 1), 'incorporated$|inc$', '',
    mds_db.mdq.RegexMask(0, 0, 1, 0, 0, 0, 0))

Notice that this is already a nested function, and it works correctly resulting in:

SomeRandom Company Incorporated

Now I want to run the same replacement again to remove the Incorporated that is now at the end of the name due to the prior replacement:

MDS_DB.mdq.RegexReplace(MDS_DB.mdq.RegexReplace(MDS_DB.mdq.RegexReplace(COMPANY, '[^a-zA-Z0-9 ]', '', 1), 'incorporated$|inc$', '', mds_db.mdq.RegexMask(0, 0, 1, 0, 0, 0, 0)) ,'incorporated$|inc$', '', mds_db.mdq.RegexMask(0, 0, 1, 0, 0, 0, 0))

This does not have the expected effect, and the name remains the same:

SomeRandom Company Incorporated

Why aren't the nested replaces working in this case?

1 Answer 1

2

I think the issue here is that the first regex replacement is leaving some trailing whitespace at the end of the name. So this company name:

Some-Random Company Incorporated Inc

actually becomes this:

Some-Random Company Incorporated[ ]      ([ ] indicates a single space)

Try removing the leading whitespace as well:

MDS_DB.mdq.RegexReplace(MDS_DB.mdq.RegexReplace(COMPANY,
    '[^a-zA-Z0-9 ]', '', 1), '[ ]+(?:incorporated|inc)$', '',
    mds_db.mdq.RegexMask(0, 0, 1, 0, 0, 0, 0))

Note that you could try to remove any number of ending company closings in one go, e.g. try this:

MDS_DB.mdq.RegexReplace(MDS_DB.mdq.RegexReplace(COMPANY,
    '[^a-zA-Z0-9 ]', '', 1), '(?:[ ]+(?:incorporated|inc))+$', '',
    mds_db.mdq.RegexMask(0, 0, 1, 0, 0, 0, 0))

Untested, but you might be able to do just a single replacement:

MDS_DB.mdq.RegexReplace(COMPANY,
    '[^a-zA-Z0-9 ]|(?:[ ]+(?:incorporated|inc))+$',
    '',
    mds_db.mdq.RegexMask(0, 0, 1, 0, 0, 0, 0))
Sign up to request clarification or add additional context in comments.

3 Comments

that is fantastic - thank you very much. do you mind explaining the regex in the last example? I'm not well versed in regex. would there be a way to combine the replacement of special characters with the other replacements as well?
@CameronTaylor Here is a demo for the last regex. It basically just matches any number of spaces followed by inc or incorporated, that entire quantity one or more times.
I know this is old but I just used this - fantastic stuff. Bravo.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.