1

I have 2 sets of operations, in the 1st one I look for files that contain a string, then in second one I use that list to extract lines that contains another string and then edit them.

$List_Of_Files = Get-ChildItem "$outputfolder*.html" -recurse | 
  Select-String -pattern "https://www.youtube.com" | group path | 
    select name -ExpandProperty Name

$List_Of_Titles = @(Get-Content $List_Of_Files | Where-Object { $_.Contains("<title>") }) | 
  Foreach-Object {
    $content = $_ -replace "    <title>", "  <video:title>";
    $content -replace "</title>", "</video:title>"
  }

Code works as expected, but the problem is that I need the 1st set of operations to output results into a text file and then use that file in second set which should also output results into another text file.

I have tried the following, but second set doesn't create the file, but doesn't give me any error either.

Get-ChildItem "$outputfolder*.html" -recurse | 
  Select-String -pattern "https://www.youtube.com" | group path | 
    select name -ExpandProperty Name | Set-Content "c:\List_Of_Files.txt"

@(Get-Content "c:\List_Of_Files.txt" | Where-Object { $_.Contains("<title>") }) |
 Foreach-Object {
    $content = $_ -replace "    <title>", "  <video:title>";
    $content -replace "</title>", "</video:title>"
 } | Set-Content "c:\list_of_titles.txt"

I have tried to modify it in different ways, but can't figure out how to make it work.

7
  • Does the file c:\List_Of_Files.txt get created? Commented Jun 24, 2016 at 16:05
  • Yes it does get created Commented Jun 24, 2016 at 16:10
  • 1
    I just tested it with a dummy file containing some <title> tags and your last pipeline worked. Could you show as a digest of the content of List_Of_Files.txt? Commented Jun 24, 2016 at 16:19
  • Side note: Set-Content produces ASCII files by default, causing any foreign chars. to be replaced with literal ? chars.; if that's a problem, use the -Encoding parameter to change that. Commented Jun 24, 2016 at 16:24
  • List_Of_Files.txt just contain complete path to files for example: c:\folder\another folder\some file name that contains a string.html etc. etc Commented Jun 24, 2016 at 16:28

1 Answer 1

1

c:\List_Of_Files.txt contains a list of file paths and you're trying to filter that list by whether the path contains "<title>", which results in no matches.
(I have no explanation for why your 1st snippet worked.)

Your problem stems from confusion over what objects are being passed through the pipeline: you start with file paths (strings), then threat them as if they were the files' content.

Instead, I assume you meant to test the contents of each file identified by its path.

A quick fix would be:

Get-Content "c:\List_Of_Files.txt" | Where-Object { Select-String -Quiet '<title>' $_ }

Note, however, that you must also adapt the ForEach-Object command accordingly:

Foreach-Object {
    # Read the content of the file whose path was given in $_,
    # and modify it.
    # (If you don't want to save the modifications, omit the `Set-Content` call.)
    $content = ((Get-Content $_) -replace "    <title>", "  <video:title>");       
    $content = $content -replace "</title>", "</video:title>";
    # Save modifications back to the input file (if desired).
    Set-Content -Value $content -Path $_;
    # $content is the entire document, so to output only the title line(s) 
    # we need to match again:
    $content -match '<video:title>'
    # Note: This relies on the title HTML element to be on a *single* line
    #       *of its own*, which may not be the case; 
    #       if it isn't, you must use proper HTML parsing to extract it.
 }

To put it all together:

Get-Content "c:\List_Of_Files.txt" | Where-Object { Select-String -Quiet '<title>' $_ } | 
    Foreach-Object {
        $content = ((Get-Content $_) -replace "    <title>", "  <video:title>");
        $content = $content -replace "</title>", "</video:title>";
        Set-Content -Value $content -Path $_;
        $content -match '<video:title>'
     } | Set-Content "c:\list_of_titles.txt"

Note that you can make the whole command more efficient by removing the filtering step that uses Select-String and performing the filtering inside the ForEach-Object block.

Also, the string replacement could be optimized or, preferably, handled with true HTML parsing.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, i don't know how to "finish" your code, in the last line of the ForEach-Object block i have put path to a text file like this - $_ c:\list_of_titles.txt but i get error saying - Unexpected token 'c:\list_of_titles.txt' in expression or statement. Could you let me know how to fix it please? Sorry for the confusion i obviously don't know much about powershell.
@PeterPavelka: It sounds like you were converting my code back to a one-liner and simply forgot a ; before $_ (in multiline code you don't need a ; at the end of a line). The more important question, however: what are you trying to save to "c:\list_of_titles.txt"? Is it the file paths or the actual title strings? If the latter, the code won't work.
No i didn't do anything else except adding "c:\list_of_titles.txt" at the end as you suggested, and i need to save the actual titles as i explained in the original question, 1st snippet gives me list of files that include a string https://www.youtube.com , then i use that list in 2nd snipped to extract lines with string "</title>" and then in all those lines i replace title tags by video title tags and save the results into a file, so yes i need the actual titles as output.
@PeterPavelka: Please see my update (just revised), and note the caveat re HTML parsing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.