2

I am trying to replace a multiline text with another multiline text. However no attempt seems to work. The searched text is not being found.

In the searched Directory are some .xml files that contain the searched text.

Here is some code:

Set-Location "D:\path\test"

$file_names = (Get-ChildItem).Name

$oldCode =  @"

       </LinearLayout>


    <TextView
        android:layout_weight="1"
        android:layout_width="wrap_content"
        android:layout_height="0dp"/>


</LinearLayout>
"@

$newCode =  @"
          ANYTHING NEW

       </LinearLayout>


    <TextView
        android:layout_weight="1"
        android:layout_width="wrap_content"
        android:layout_height="0dp"/>


</LinearLayout>
"@

foreach ($name in $file_names)
{
    # None of below code works.

    # Attempt 1: trying to find the code
    Write-Host $name
    $ret_string = (Get-Content -raw -Path $name | Select-String $oldCode -AllMatches | % { $_.matches}).count
    Write-Host $ret_string

    # Attempt 2: trying to actually replace the string
    $fileContent = Get-Content $name -Raw
    Write-Host $fileContent
    $newFileContent = $fileContent -replace $oldCode, $newCode
    Write-Host $newFileContent

    # Attempt 3: another try to replace the string
    ((Get-Content -Path $name -Raw) -replace $oldCode, $newCode) | Set-Content -Path $name
}

Write-Host "Press any key to continue..."
$Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")

(Thanks for all the help, here is a link to the working solution with test files: https://github.com/Sokrates1989/powershell-multiline-replace)

1
  • As an aside: If you use Get-Content -Raw to read a file as a whole, you should use -NoNewLine when writing the string back to a file with Set-Content, so as to prevent a newline from getting appended. Commented Oct 5, 2022 at 21:00

2 Answers 2

1

Select-string and -replace both take regular expressions for their search string. Because some common characters have special meaning in regular expressions, it's always a good idea to use the [Regex]::Escape() method whenever you want an exact match to a literal string. It offers the additional advantage of revealing the exact nature of whitespace (eg. vs. ) and non-printing characters such as and .

With the posted code,

$oldCode =  [Regex]::Escape(@"

       </LinearLayout>



    <TextView
        android:layout_weight="1"
        android:layout_width="wrap_content"
        android:layout_height="0dp"/>


</LinearLayout>
"@)


PS > $oldCode
\n\ \ \ \ \ \ \ </LinearLayout>\n\n\n\n\ \ \ \ <TextView\n\ \ \ \ \ \ \ \ android:layout_weight="1"\n\ \ \ \ \ \ \ \ android:layout_width="wrap_content"\n\ \ \ \ \ \ \ \ android:layout_height="0dp"/>\n\n\n</LinearLayout>

The above still won't match to the posted $NewCode because of two extra spaces preceding the first </LinearLayout>, so you matching will be more robust if replace any occurance of one or more whitespace characters with the regular experssion \s+, which matches one or more whitespace characters.

Also, as mklement0's answer explains, depending on the source, newlines may be encoded as either \n or \r\n, so both can be matched with the expression \r?\n.

So, with your escaped literal string, you can do the following:

PS > $oldcode -replace ( '((\\ )|(\\t))+' , '\s+' ) -replace ( '(\\r)?\\n' , '\r?\n' )
\r?\n\s+</LinearLayout>\r?\n\r?\n\r?\n\r?\n\s+<TextView\r?\n\s+android:layout_weight="1"\r?\n\s+android:layout_width="wrap_content"\r?\n\s+android:layout_height="0dp"/>\r?\n\r?\n\r?\n</LinearLayout>
PS >

And of course, you can combine the escpaing and the whitespace/newline conversion:

$OldCodeRobustRegex =  [Regex]::Escape(@"

       </LinearLayout>



    <TextView
        android:layout_weight="1"
        android:layout_width="wrap_content"
        android:layout_height="0dp"/>


</LinearLayout>
"@)  -replace ( '((\\ )|(\\t))+' , '\s+' ) -replace ( '(\\r)?\\n' , '\r?\n' )

Output:

PS > $OldCodeRobustRegex
\r?\n\s+</LinearLayout>\r?\n\r?\n\r?\n\r?\n\s+<TextView\r?\n\s+android:layout_weight="1"\r?\n\s+android:layout_width="wrap_content"\r?\n\s+android:layout_height="0dp"/>\r?\n\r?\n\r?\n</LinearLayout>
PS >
PS > $NewCode -match $OldCodeRobustRegex
True
PS >

P.S.

If there's a chance of extraneous empty lines (successive newlines), you can modify the newline replacement portion to:

 -replace ( '((\\r)?\\n)+' , '(\r?\n)+' )
Sign up to request clarification or add additional context in comments.

13 Comments

Unfortunately it did not work. I ended up opening all files with a text editor. The editor provided an option to replace text in all opened files..
Probably a whitespace mis-match. In the code you posted, $oldcode won'te even match to $newcode because $oldcode has two more spaces preceding the first non-whitespace characters ( </LinearLayout> ) than the same line in $newcode. A more robust regular expresssion would use \s+ rather than literal spaces...
Note that newlines, irrespective of their format (CRLF vs. LF) do not require escaping in regular expressions, because they aren't metacharacters. However, in order to match either newline format, it makes sense to replace literal newlines with \r?\n - I suspect that a mismatch between the newline format in the string literals and that of the input files is the root of the problem.
Be that as it may, in the sample strings provided, It is whitespace that prevents $OldCode from matching what is at first glance is identtical text in $NewCode, not newlines. And anytime you want a literal match with regex, [Regex]::Escape() is a good idea, much like using the .Trim() method on string input even when leading/trailing whitespace is not expected....
For full robustness you need [regex]::Escape("m\n\o`r`np`nq`n\n") -replace '(?m)(?<=(?:^|[^\\])(?:\\\\)*)(?:\\r)?\\n' , '\r?\n', which is what I meant with unwieldy. This isn't just about whether it works in this case, it's about properly framing a solution: Future readers will come here with different search strings, and eventually someone will hit false positives, so the most helpful framing is to add a disclaimer to the answer that states its assumptions / limitations.
|
1

Note:

  • The solutions in the next sections assume that your search string is correct with respect to intra-line whitespace, and address only the potential newline-format mismatch discussed below.

  • The bottom section additionally uses flexible intra-line whitespace matching, by more fully taking advantage of the fact that the -replace operator is regex-based, which increases the complexity of the solution, however.


The challenge with multi-line search strings is that there may be a mismatch in newline format (Windows-format CRLF vs. Unix-format LF-only newlines) between the multi-line search string literal and the content of the files:

  • A file may use either format, depending on how it was created.

  • A multi-line string literal uses the same newline format as that of the enclosing script file.

Note:

  • If all your input files use the same newline format, you may be able to get your code to work if you (re)save your script file with the same newline format as well.

  • Otherwise - if there's a mix of newline formats and/or your code must work on both Windows and non-Windows platforms with files that use platform-native newlines - the solution below is needed.


The solution is to replace the literal newlines in the search string, $oldCode, with regex \r?\n, which matches either newline format:

$oldCode =  @"

       </LinearLayout>



    <TextView
        android:layout_weight="1"
        android:layout_width="wrap_content"
        android:layout_height="0dp"/>


</LinearLayout>
"@ -replace '\r?\n', '\r?\n'

Note:

  • The -replace operation looks like a no-op, but actually replaces the literal newlines - be they CRLF or LF ones - with verbatim \r?\n, which - when later used as regexes with -replace or Select-String - again matches newlines of either format.

  • The above assumes that the multi-line search string either contains no regex metacharacters (which is the case here) or was deliberately constructed as a regex. If you want to treat the search string verbatim even if it contains regex metacharacters, more work is needed - see the bottom section.

Caveat re multi-line replacement strings:

  • Multi-line string literals using the enclosing script file's newline format implies that your replacement string too will use whatever newline format the script was saved with - which may differ from that of your input files.

  • You can control the replacement string's newline format by replacing its literal newlines with the newlines of interest; e.g.:

$newCode =  @"
        ANYTHING NEW

     </LinearLayout>



    <TextView
        android:layout_weight="1"
        android:layout_width="wrap_content"
        android:layout_height="0dp"/>


</LinearLayout>
"@ -split '\r?\n' -join "`n"
  • The above splits the multi-line strings into lines with -split '\r?\n' and re-joins (with -join) them with "`n", i.e. LF-only newlines.

  • Use "`r`n" for CRLF newlines, or [Environment]::NewLine for platform-native newlines.

  • To match the newline format of a given input file, read it with Get-Content -Raw and then use -match '\r\n' to look for the presence of at least one CRLF newline, along the lines of:

     $fileContent = Get-Content $name -Raw
     $newline = if ($fileContent -match '\r\n') { "`r`n" } else { "`n" }
     $fileContent -replace $oldCode ($newCode -split '\r?\n' -join $newline)
    

If the search string is to (always) be treated verbatim (apart from its newline format):

  • [regex]::Escape() is normally used to escape an arbitrary string that is to be treated verbatim by the .NET regex engine.

  • The challenge is that it escapes literal CR as \r and literal LF as \n, whereas the goal here is to represent all newline as \r?\n.

  • Robustly replacing all verbatim \r\n and stand-alone \n sequences in the result from [regex]::Escape() after the fact is non-trivial, notably if false positives must be ruled out; the solution below works around this challenge by first splitting the multi-line string into lines with -split '\r?\n', then escaping the lines individually, and then joining the escaped lines with literal \r?\n

$oldCode =  (@"

       </LinearLayout>



    <TextView
        android:layout_weight="1"
        android:layout_width="wrap_content"
        android:layout_height="0dp"/>


</LinearLayout>
"@ -split '\r?\n').ForEach({ [regex]::Escape($_) }) -join '\r?\n'

A regex-based solution with flexible newline-format matching and flexible intra-line whitespace matching:

For flexible newline-format and intra-line whitespace matching, it is simplest to formulate your search directly as a regex:

  • As shown above, use \r?\n instead of literal newlines in order to match both CRLF and LF newlines.

  • To also match intra-line whitespace flexibly, use [ \t]+ to match any (non-empty) run of spaces and/or tab characters.

However, there are two problems:

  • Formulating your literal search string as a regex requires you to additionally manually \-escape all regex metacharacters (such as . or \). Also, using [ \t]+ for intra-line whitespace and especially \r?\n in lieu of actual newlines can hinder the readability of your search string (although you could mitigate the latter with the x (IgnoreWhiteSpace) option).

  • If a literal search string is given to you, you have no choice but to perform all of the above programmatically.

Programmatic escaping is a two-step process:

  • First, apply [regex]::Escape() which ensures that the input string is treated as a literal search in the context of the regex matching performed by the -replace operator.

  • Then, identify the escaped intra-line whitespace characters (runs of \ and/or \t) and newlines (\r\n or \n) in the resulting regex and replace them with their flexible / format-agnostic regex equivalents, [ \t]+ and \r?\n.

The challenge in the latter step is to rule out false positives, so that, for instance verbatim \n or \\n or \\\n in the verbatim input string aren't mistaken for escaped literal LFs, which [regex]::Escape() renders as \n.

The latter step therefore requires complex regexes that rule out such false positives, which in essence requires ignoring , t, r and n characters that are preceded by an even number of \ characters, as that implies that these \-character runs are escaped \ characters rather than \-characters that form an escape sequence with the subsequent character (e.g, \n).

To demonstrate these complex regexes with a simple example:

# The search string to be interpreted literally.
# Note: 
#  * PowerShell's string interpolation is used to embed
#    literal CRLF ("`r`n"), LF ("`n"), and tabs ("`t") in the string.
#  * Also, the string contains literal "\n", "\r" and "\t" 
#    substrings, which must NOT be mistaken for regex esscape sequences
#    for LFs, CRs, and tabs in the replacement after applying [regex]::Escape()
#       
$oldCode = "m\n\o`r`nUNC path:`t\\naples\receipts\test`nlast    line."

# Transform it to a regex with flexible whitespace matching, 
# both intra-line and with respect to the newline format
# (matching both CRLF and LF newlines).
$oldCodeAsRegex = 
  [regex]::Escape($oldCode) `
    -replace '(?m)(?<=(?:^|[^\\])(?:\\\\)*)(?:\\[ t])+', '[ \t]+' `
    -replace '(?m)(?<=(?:^|[^\\])(?:\\\\)*)(?:\\r)?\\n' , '\r?\n'

$oldCodeAsRegex now has the following verbatim value:

m\\n\\o\r?\nUNC[ \t]+path:[ \t]+\\\\naples\\receipts\\test\r?\nlast[ \t]+line\.

As you can see, the literal whitespace including the newlines was replaced with flexible regex constructs, while false positives were avoided.

A quick test shows that the regex matches the original string:

$oldCode -match $oldCodeAsRegex  # -> $true

A test with a whitespace variations of the original string, as might be encountered in the file input:

# Similar to $oldCode, but with extra whitespace inserted,
# and with only CRLF or only LF newlines, depending on the script's format.
$oldCodeVariation = 
  @'
m\n\o
UNC path:               \\naples\receipts\test
last                    line.
'@ 

$oldCodeVariation -match $oldCodeAsRegex  # -> $true

The multi-step processing and the complexity of the regex makes it hard to explain in detail what is going on, but see this regex101.com page for an operation that:

  • uses the escaped form of $oldCodeVariation, i.e. the return value from [regex]::Escape($oldCodeVariation)

  • with the equivalent of the -replace '(?m)(?<=(?:^|[^\\])(?:\\\\)*)(?:\\[ t])+', '[ \t]+' replacement, i.e. the replacement that replaces non-empty intra-line whitespace runs of spaces and/or tabs with [ \t]+, for later use of the result in another -replace operation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.