4

How can I do fuzzy string matching within PowerShell scripts?

I have different sets of names of people scraped from different sources and have them stored in an array. When I add a new name, I like to compare the name with existing name and if they fuzzily matches, I like to consider them to be the same. For example, with data set of:

@("George Herbert Walker Bush",
  "Barbara Pierce Bush",
  "George Walker Bush",
  "John Ellis (Jeb) Bush"  )

I like to see following outputs from the given input:

"Barbara Bush" -> @("Barbara Pierce Bush")
"George Takei" -> @("")
"George Bush"  -> @("George Herbert Walker Bush","George Walker Bush")

At minimum, I like to see matching to be case insensitive, and also flexible enough to handle some level of misspelling if possible.

As far as I can tell, standard libraries does not provide such functionalities. Is there an easy-to-install module which can accomplish this?

2

1 Answer 1

15

Searching at PowerShell Gallery with term "fuzzy", I found this package: Communary.PASM.

It can be simply installed with:

PS> Install-Package Communary.PASM                                                                                                     

The project is found here in GitHub. I simply looked at this examples file for reference.

Here is my examples:

$colors = @("Red", "Orange", "Yellow", "Green", "Blue", "Violet", "Sky Blue" )

PS> $colors | Select-FuzzyString Red

Score Result
----- ------   
  300 Red

This is a perfect match, with 100 max score for each characters.

PS> $colors | Select-FuzzyString gren

Score Result
----- ------
  295 Green 

It tolerate a little missing characters.

PS> $colors | Select-FuzzyString blue

Score Result  
----- ------     
  400 Blue       
  376 Sky Blue

Multiple values can be returned with different scores.

PS> $colors | Select-FuzzyString vioret

# No output

But it does not tolerate a little bit of misspell. Then I also tried Select-ApproximateString:

PS> $colors | Select-ApproximateString vioret
Violet

This has different API that it only returns a single match or nothing. Also it may not return anything when Select-FuzzyString does.

This was tested with PowerShell Core v6.0.0-beta.9 on MacOS and Communary.PASM 1.0.43.

Sign up to request clarification or add additional context in comments.

1 Comment

Powershell 7 has some kind of fuzzy match with misspelled commands. I wonder if there's a way to apply it generally.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.