8

how can I find the matching strings in an array of strings in PowerShell:

example:

$Arr = "1 string first",
"2 string second",
"3 string third",
"4 string fourth"

Using this example, I want this returned:

" string "

I want to use this to find matching parts of file names and then remove that part of the file name (like removing the artist's name from a set of mp3 files for example), without having to specify which part of the file name should be replaced manually.

2
  • 1
    Sounds like you're looking for Longest Common Substring. A quick search and I don't see any PowerShell implementations of this algorithm online yet. Commented Nov 20, 2011 at 3:42
  • Thanks dude.. That put me in the right direction. I found a C# LongestCommonSubstring, which took only a few minutes to convert to PowerShell. Time to update the question. Commented Nov 20, 2011 at 5:26

4 Answers 4

6
$arr =  "qdfbsqds", "fbsqdt", "bsqda" 
$arr | %{

$substr = for ($s = 0; $s -lt $_.length; $s++) {
           for ($l = 1; $l -le ($_.length - $s); $l++) {
            $_.substring($s, $l);
           }
          } 
$substr | %{$_.toLower()} | select -unique

} | group | ?{$_.count -eq $arr.length} | sort {$_.name.length} | select -expand name -l 1
# returns bsqd
  • produce a list of all the unique substrings of the inputstrings
  • filter for substrings that occur #inputstrings times (i.e. in all input strings)
  • sort these filtered substrings based on the length of the substring
  • return the last (i.e. longest) of this list
Sign up to request clarification or add additional context in comments.

1 Comment

This can be simplified somewhat by 1) only generating substrings for the shortest string ($shortest = $arr | sort Length | select -first 1), and 2) generating the substrings in order of length: $length..1 | foreach { $l = $_; 0..($length - $l) | foreach { $shortest.Substring( $_, $l ) } } | where { @($arr -match [regex]::Escape( $_ )).Count -eq $arr.Count } | select -first 1
1

If it ( the artist name etc) is only going to be a single word:

$Arr = "1 string first", "2 string second", "3 string third", "4 string fourth"
$common = $Arr | %{ $_.split() } | group | sort -property count | select -last 1 | select -expand name
$common = " {0} " -f $common

Update:

Implementation that seems to work for multiple words ( finding the longest common substring of words):

$arr = "1 string a first", "2 string a second", "3 string a third", "4 string a fourth"
$common = $arr | %{
$words = $_.split()
$noOfWords = $words.length
for($i=0;$i -lt $noOfWords;$i++){
    for($j=$i;$j -lt $noOfWords;$j++){
        $words[$i..$j] -join " "
    }
}

} | group | sort -property count,name | select -last 1 | select -expand name

$common = " {0} " -f $common
$common

7 Comments

It can be anything, a set of words, might have underscores, etc.
@Winfred - Then it becomes more complex if you want a set of words...You will have to spend some time on getting the code...
Worst case I can try something like converting each string to a chararray and then start comparing the individual chars. But mufti-dimensional arrays hurt my head really bad. I'm hoping someone might have a piece of code that does this already.
@Winfred - Since this is a common enough problem, I wrote something which seems to work. Updated answer. Try it out / expand on it.
It does work on single words or multiple words, but it breaks down when the strings don't contain spaces: $arr = "Name_Artist_Lovely", "Name_Artist_Song", "Name_Artist_SoPretty" Should return: "Name_Artist_" But returns: " Name_Artist_SoPretty"
|
1

Here is a "Longest Common Substring" function for two strings in PowerShell (based on wikibooks C# example):

Function get-LongestCommonSubstring
{
Param(
[string]$String1, 
[string]$String2
)
    if((!$String1) -or (!$String2)){Break}
    # .Net Two dimensional Array:
    $Num = New-Object 'object[,]' $String1.Length, $String2.Length
    [int]$maxlen = 0
    [int]$lastSubsBegin = 0
    $sequenceBuilder = New-Object -TypeName "System.Text.StringBuilder"

    for ([int]$i = 0; $i -lt $String1.Length; $i++)
    {
        for ([int]$j = 0; $j -lt $String2.Length; $j++)
        {
            if ($String1[$i] -ne $String2[$j])
            {
                    $Num[$i, $j] = 0
            }else{
                if (($i -eq 0) -or ($j -eq 0))
                {
                        $Num[$i, $j] = 1
                }else{
                        $Num[$i, $j] = 1 + $Num[($i - 1), ($j - 1)]
                }
                if ($Num[$i, $j] -gt $maxlen)
                {
                    $maxlen = $Num[$i, $j]
                    [int]$thisSubsBegin = $i - $Num[$i, $j] + 1
                    if($lastSubsBegin -eq $thisSubsBegin)
                    {#if the current LCS is the same as the last time this block ran
                            [void]$sequenceBuilder.Append($String1[$i]);
                    }else{ #this block resets the string builder if a different LCS is found
                        $lastSubsBegin = $thisSubsBegin
                        $sequenceBuilder.Length = 0 #clear it
                        [void]$sequenceBuilder.Append($String1.Substring($lastSubsBegin, (($i + 1) - $lastSubsBegin)))
                    }
                }
            }
        }
    }
    return $sequenceBuilder.ToString()
}

To use this for more than two strings, use it like this:

Function get-LongestCommonSubstringArray
{
Param(
[Parameter(Position=0, Mandatory=$True)][Array]$Array
)
    $PreviousSubString = $Null
    $LongestCommonSubstring = $Null
    foreach($SubString in $Array)
    {
        if($LongestCommonSubstring)
        {
            $LongestCommonSubstring = get-LongestCommonSubstring $SubString $LongestCommonSubstring
            write-verbose "Consequtive diff: $LongestCommonSubstring"
        }else{
            if($PreviousSubString)
            {
                $LongestCommonSubstring = get-LongestCommonSubstring $SubString $PreviousSubString
                write-verbose "first one diff: $LongestCommonSubstring"
            }else{
                $PreviousSubString = $SubString
                write-verbose "No PreviousSubstring yet, setting it to: $PreviousSubString"
            }
        }
    }
    Return $LongestCommonSubstring
}


get-LongestCommonSubstringArray $Arr -verbose

3 Comments

Should be able to run it multiple times, combining the results. Unless I just haven't gotten enough sleep, longest common substring should be transitive, so something like this should work: LCS(LCS(LCS(1 2) 3) 4).
I don't think that LCS is transitive. Consider these strings: "some big string", "some other big string", "some thing". If you did LCS(3, LCS(1, 2)) then you would get "ing" instead of "some "
You're right. For my particular problem, this wasn't an issue, but I won't flag the question answered. I'll revisit the code some other time and try to figure out a better way to solve it.
0

If I understand your question:

$Arr = "1 string first", "2 string second", "3 string third", "4 string fourth"
$Arr -match " string " | foreach {$_ -replace " string ", " "}

2 Comments

OP is asking how to find that the " string " is the common one.
That's correct. I'm trying to find that " string " is the common one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.