12

Regex experts please help to see if this problem can be solved by regex:

Given string 1 is any string

And string 2 is any string containing all parts of string 1 (but not a simple match -- I will give example)

How to use regex to replace all parts of string 1 in string 2 with blank so that what's remained is the string not in string 1?

For example: str1 = "test xyz"; str2 = "test ab xyz"

I want " ab" or "ab " back. What is the regex I can write so that when I run a replace function on str2, it will return " ab"?

Here is some non-regex code:

            function findStringDiff(str1, str2) {
                var compareString = function(str1, str2) {
                    var a1 = str1.split("");
                    var a2 = str2.split("");
                    var idx2 = 0;
                    a1.forEach(function(val) {
                        if (a2[idx2] === val) {
                          a2.splice(idx2,1);
                        } else {
                            idx2 += 1;
                        }
                    });
                    if (idx2 > 0) {
                        a2.splice(idx2,a2.length);
                    }
                    return a2.join("");
                }

                if (str1.length < str2.length) {
                    return compareString(str1, str2);
                } else {
                    return compareString(str2, str1);
                }
            }

            console.log(findStringDiff("test xyz","test ab xyz"));
28
  • 13
    I don't see how regular expressions would be helpful here at all. Commented Apr 11, 2015 at 3:21
  • 3
    Btw, the algorithm you have shown here would make it seem that there are no differences between '$1.00' and '00.1$'. Commented Apr 11, 2015 at 3:24
  • 1
    The code above even thinks that "ab" and "cd" are the same. Commented Apr 11, 2015 at 3:34
  • 1
    What is the expected output? Commented Apr 11, 2015 at 4:14
  • 2
    Can you give multiple examples with more than just a one character difference? It's still unclear what you want. Commented Apr 11, 2015 at 4:33

4 Answers 4

22

Regexes only recognize if a string matches a certain pattern. They're not flexible enough to do comparisons like you're asking for. You would have to take the first string and build a regular language based on it to recognize the second string, and then use match groups to grab the other parts of the second string and concatenate them together. Here's something that does what I think you want in a readable way.

//assuming "b" contains a subsequence containing 
//all of the letters in "a" in the same order
function getDifference(a, b)
{
    var i = 0;
    var j = 0;
    var result = "";

    while (j < b.length)
    {
        if (a[i] != b[j] || i == a.length)
            result += b[j];
        else
            i++;
        j++;
    }
    return result;
}

console.log(getDifference("test fly", "test xy flry"));

Here's a jsfiddle for it: http://jsfiddle.net/d4rcuxw9/1/

Sign up to request clarification or add additional context in comments.

3 Comments

I see. j is the index for b, and i for a. You are looping through the longer string and storing the "not found/different" char in result. I like it. Since regex is not possible, I'll mark this as my preferred answer. Thanks Millie!
I know that I'm extremely late and this question is closed, but just in case someone wants to find the difference between two strings regardless of the order of the characters: jsfiddle.net/c8xchkxq
Nice and simple solution, thanks! I needed the same on word level, and wanted to also receive the positions of the added words. If someone else is interested, see: jsfiddle.net/409doc37
1

I find this question really interesting. Even though I'm a little late, I would like to share my solution on how to accomplish this with regex. The solution is concise but not very readable.

While I like it for its conciseness, I probably would not use it my code, because it's opacity reduces the maintainability.

var str1 = "test xyz",
    str2 = "test ab xyz"
    replacement = '';
var regex = new RegExp(str1.split('').map(function(char){
    return char.replace(/[.(){}+*?[|\]\\^$]/, '\\$&');
}).join('(.*)'));
if(regex.test(str2)){
    for(i=1; i<str1.length; i++) replacement = replacement.concat('$' + i);
    var difference = str2.replace(regex, replacement);
} else {
    alert ('str2 does not contain str1');
}

The regular expression for "test xyz" is /t(.*)e(.*)s(.*)t(.*) (.*)x(.*)y(.*)z/ and replacement is "$1$2$3$4$5$6$7".

The code is no longer concise, but it works now even if str1 contains special characters.

13 Comments

I first thought it was limited to 10 characters for str1. But I just learnt that Javascript allow for back references with numbers larger than 9.
This doesn't find the difference between test xyz vs test xy and test{2 spaces}xyz vs test xyz.
@LorenzMeyer I am pretty excited. I think you are on to something. But when I use var str1 = "$1.00", str2 = "$1..00", it's not finding the dot. I hope you can come up with a robust solution -- so you are dynamically construction the regex based on the str1 input, interesting...
Yes, it does not find the dot, because a dot is a special character in regexes. It would not work for (){}+*\[]neither. For a robust solution, we need to escape all of those special characters.
@LorenzMeyer Did you mean you updated to the code to handle special characters like dot or dollar sign? I tried dot and dollar sign and the code is not working. jsfiddle.net/mnzhbz7o
|
-2

To find out if there are extra '.' like you are asking for, you can do this:

result = "$1...00".match(/\$1\.(\.*)?00/)[1];

result is then the EXTRA '.'s found. You cannot use regex to compare strings using only regex. Perhaps use this, then compare the results.

You can also try this:

result = "$1...00".match(/(\$)(\d+)\.(\.*)?(\d+)/);
// Outputs: ["$1...00", "$", "1", "..", "00"]

Which will extract the various parts to compare.

Comments

-2

If you are only concerned with testing whether a given string contains two or more sequential dot '.' characters:

var string = '$1..00',
    regexp = /(\.\.+)/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

If you need it to match the currency format:

var string = '$1..00',
    regexp = /\$\d*(\.\.+)(?:\d\d)+/;

alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));

But I caution you that Regular Expressions aren't for comparing the differences between two strings; they are used for defining patterns to match against given strings.

So, while this may directly answer how to find the "multiple dots" pattern, it is useless for "finding the difference between two strings".

The StackOverflow tag wiki provides an excellent overview and basic reference for RegEx. See: https://stackoverflow.com/tags/regex/info

5 Comments

The question was about comparing two strings, not just removing a string.
@LorenzMeyer See above where I explained: 'But I caution you that Regular Expressions aren't for comparing the differences between two strings; they are used for defining patterns to match against given strings. So, while this may directly answer how to find the "multiple dots" pattern, it is useless for "finding the difference between two strings".'
@LorenzMeyer Also note my early comments on the OP's question above, where it is also worth noting that the question was refined multiple times during which an insistance on a RegEx solution specifically for the "multiple dots" pattern was conveyed. The question was later put on hold for being unclear.
@LorenzMeyer Lastly, see When should I vote down? Where one is instructed to "use your downvotes whenever you encounter an egregiously sloppy, no-effort-expended post, or an answer that is clearly and perhaps dangerously incorrect." Considering this is a good-faith effort to provide a working solution to a specifically asked for portion of the OP's unclear question with a clear explanation, I am surprised you found it to be egregiously sloppy, no-effort-expended and perhaps dangerously incorrect.
@gfullam I tried to vote it up but I don't have enough reputations :) However, my question has always been using regular expression to find the difference in two strings -- it's in the title. The first example I gave was about str1="$1.00" while str2="$1..00". So I think that's the confusion. Finding a double . is simple, but what I am really interested from day 1 is to some how replace $1.00 WITHIN $1..0, so that only a . is left. I also got a down vote for asking a question -- I don't really think much of it :) And I know you are trying to help. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.