0

I've got a program that uses a few hash tables to resolve information. I'm getting some weird issues with foreign characters. Below is an accurate representation:

$Props =
@{
    P1  = 'Norte Americano e Inglês'
}

$Expressions =
@{
    E1  = { $Props['P1'] }
}

& $Expressions['E1']

If I paste this into PowerShell 5.1 console or run selection in VSCode I get:

Norte Americano e Inglês

As expected. But if I run the code in VSCose (hit F5). I get:

Norte Americano e Inglês

By debugging, setting a breakpoint right after the hash literal, I can tell the incorrect version is actually in the hash. So this isn't somehow a side effect of the call operator or the use of script blocks.

I attempted to set the output encoding like:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

But this doesn't seem to change the pattern. Frankly, I'm surprised the console is handling Unicode so well in the first place. However, I can't understand the inconsistency. Ultimately this data is written to an AD attribute which again works fine if I execute the steps manually, but gets mangled if I actually run the script, even when the output encoding is set as previously mentioned.

I did look through this Q&A, but I don't seem to be having a console display issue, although that may be a result of the true type fonts. Perhaps they're masking the problem.

Interestingly it does seem to work correctly in VSCode if I switch it to PowerShell 7.1. However, because of integration with the AD cmdlets, which do not function well through implicit session compatibility, it's not possible to use PowerShell Core for this project.

The Dev environment is Windows 2012R2 up-to-date. I'm not sure there's an ability to change the system code page as is mentioned for Win 10 (1909).

1
  • Since the problem occurs with a string literal in your source code, the likeliest explanation is that your script file is misinterpreted by PowerShell, which happens if the script is saved as UTF-8 without a BOM. Try saving your script as UTF-8 with BOM; see this answer for more information. Commented Jun 4, 2021 at 18:24

1 Answer 1

1

This is pretty ugly but what happens if you try this at the end of your code:

$enc = [System.Text.Encoding]::UTF8
$enc.GetString($enc.GetBytes($(& $Expressions['E1'])))

Also, this might help you Encode a string in UTF-8

Sign up to request clarification or add additional context in comments.

4 Comments

That's definitely helping. Let me get this into the main code and close this issue out in the morning. This seems to be working too [System.Text.Encoding]::UTF8.GetString( [Char[]](& $Expressions['E1']) ) but I have to see it through to the output. THANKS!
This worked well under the circumstances. Though I would still like an explanation of the observed behavior. At any rate, I ran with your sample adjusted as mentioned and packed it in a function for convenience. Thanks again!
Glad it worked Steven. My guess is that by default PS std out seems to be UTF8-noBOM and we're forcing UTF8-BOM here. Though I thought by default PS was always using UTF8-BOM, this is strange behavior for me too :P
This answer is probably the right solution to the problem: if you save the script file as UTF-8 with BOM, Windows PowerShell no longer misinterprets it (PowerShell Core defaults to UTF-8, and therefore reads it correctly even without a BOM). Your solution attempt tries to fix the already-misinterpreted string after the fact, but this isn't a complete solution, because certain Unicode characters can break reading of the script altogether, such as an .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.