I have a list of over 500 strings I need to search for. (They're URLs, if that matters.) I have a web site with over 1,000 web pages. I want to search each of those web pages to find which URLs each links to.
Back when our web site was on a Unix box, I would've written a little shell script using find and grep to accomplish this, but now we're on a Windows machine, so that's not really an option. I've no experience with PowerShell at all, but I suspect this is what I need. However, I've no idea how to even start.
Ideally, what I would like to end up with is something like this:
<filename 1>
<1st string found>
<2nd string found>
<3rd string found>
<filename 2>
<1st string found>
<2nd string found>
I don't need to know the line number; I just need to know which URLs are in which files. (We're going to be moving all 500+ target URLs to new locations, so we're going to have to manually update the links in the 1,000+ web pages. It will be a royal pain.)
Presumably the logic would be something like this:
for each file {
print the filename
for each string {
if string found in file {
print the string
}
}
}
We can't do a find/replace directly because the web pages are located in a content management system. All we can do is locate which pages need to be updated (using a static copy of the web pages on a local drive), then manually update the individual pages in the CMS.
I'm hoping this is easy to do, but my complete unfamiliarity with PowerShell means I've no idea where to start. Any help would be greatly appreciated!
Update
Thanks to Travis Plunk for the help! Based upon his answer, here is the final version of the code I'll be using.
# Strings to search for
$strings = @(
'http://www.ourwebsite.com/directory/somefile.pdf'
'http://www.ourwebsite.com/otherdirectory/anotherfile.pdf'
'http://www.otherwebsite.com/directory/otherfile.pdf'
)
# Directory containing web site files
cd \OurWebDirectory
$results = @(foreach($string in $strings)
{
Write-Host "Searching files for $string"
# Excluding the images directory
dir . -Recurse -Exclude \imagedir | Select-String -SimpleMatch $string
}) | Sort-Object -Property path
$results | Group-Object -Property path | %{
"File: $($_.Name)"
$_.Group | %{"`t$($_.pattern)"}
}
bodyonly) or the full HTML contents itself? ((EDIT: This matters, because we need to save the full HTML and search in all thehreffields, for example)).