5

I cannot get it work to write a correct utf-8 string to a powershell sub-process. ASCII characters work but utf-8 chars, e.g. 'ü', will be interpreted differently. Same problem when reading from the same powershell sub-process.

Summarized: I want to use powershell through my program with utf-8 encoding.

Update: Allocating a console with AllocConsole(); and then calling SetConsoleCP(CP_UTF8); and SetConsoleOutputCP(CP_UTF8);, as @mklement mentioned in his answer, worked for me, if you have a GUI application without any console. If you have a console application you don't have to allocate the console manually.

Update 2: If you have a GUI and called AllocConsole(), you can just call ShowWindow(GetConsoleWindow(), SW_HIDE); afterwards to hide the console, as mentioned here.

What I have tried so far:

  • Setting Input and Output encoding to utf-8 $OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8 within the process
  • Doing the same with UTF-16 in case there is a bug, e.g. ...ext.Encoding]::Unicode
  • Doing the same with ISO-Latin 1 (cp1252)
  • Using wchar_t as buffer and input for all tested encodings
  • Testing byte order for the given string
  • Testing Unicode (4 byte per character instead of 2)
  • Building the string bit by bit by myself
  • Setting compiler flag to \D UNICODE

Code Example for writing:

std::string test("ls ä\n");
DWORD ret = WriteFile(std_in_write, test.c_str(), test.size(), &number_of_bytes_written, nullptr);
if (ret == 0) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_WRITE_TO_FILE, GetLastError());
}

Output: ls ä

Example Code:

HANDLE std_in_read = nullptr;
HANDLE std_in_write = nullptr;
HANDLE std_out_read = nullptr;
HANDLE std_out_write = nullptr;
SECURITY_ATTRIBUTES security_attr;
STARTUPINFO startup_info;
PROCESS_INFORMATION process_information;
DWORD buffer_size = 1000000;

security_attr = {sizeof(SECURITY_ATTRIBUTES), nullptr, true};

if (!CreatePipe(&std_in_read, &std_in_write, &security_attr, buffer_size)) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_CREATE_IN_PIPE, GetLastError());
}

if (!CreatePipe(&std_out_read, &std_out_write, &security_attr, buffer_size)) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_CREATE_OUT_PIPE, GetLastError());
}

GetStartupInfo(&startup_info);
startup_info.dwFlags = STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW;
startup_info.wShowWindow = SW_HIDE;
startup_info.hStdOutput = std_out_write;
startup_info.hStdError = std_out_write;
startup_info.hStdInput = std_in_read;

if (!CreateProcess(TEXT(default_powershell_path), nullptr, nullptr, nullptr, TRUE, 0, nullptr, TEXT(default_windows_path), &startup_info, &process_information)) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_CREATE_PROCESS, GetLastError());
}

std::string test("ls ä\n");
DWORD ret = WriteFile(std_in_write, test.c_str(), test.size(), &number_of_bytes_written, nullptr);
if (ret == 0) {
    throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_WRITE_TO_FILE, GetLastError());
}

DWORD dword_read;
while (true) {
    DWORD total_bytes_available;
    if (PeekNamedPipe(std_out_read, nullptr, 0, nullptr, &total_bytes_available, nullptr) == 0) {
        throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_COPY_FROM_PIPE, GetLastError());
    }

    if (total_bytes_available != 0) {
        DWORD minimum = min(buffer_size, total_bytes_available);
        char buf[buffer_size];
        if (ReadFile(std_out_read, buf, minimum, &dword_read, nullptr) == 0) {
            throw PowershellHelper::Exception(PowershellHelper::Exception::Error::COULD_NOT_READ_FILE, GetLastError());
        }

        std::string tmp(buf);
        std::cout << tmp << std::endl;
    }

    if (total_bytes_available == 0) {
        break;
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}

Note: No duplicate of redirect-input-and-output-of-powershell-exe-to-pipes-in-c, since the code only works for ASCII characters and won't handle utf-8 characters at all.

Also no duplicate of c-getting-utf-8-output-from-createprocess, because the suggested solutions won't work as mentioned above and I want to input utf-8 as well as read utf-8.

1 Answer 1

2

You need to set the console in- and output code pages to 65001 (UTF-8) before creating your PowerShell process, via the SetConsoleCP and SetConsoleOutputCP WinAPI functions, because the PowerShell CLI uses them to decode its stdin input and to encode its stdout output.

(By contrast, $OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8 only applies intra-PowerShell-session when making external-program calls from PowerShell.)

Note: If the calling process isn't itself a console application, you may have to allocate a console before calling SetConsoleCP and SetConsoleOutputCP, using the AllocConsole WinAPI function, but I'm frankly unclear on (a) whether that makes this console instantly visible (which may be undesired) and (b) whether the CreateProcess call then automatically uses this console.

It that doesn't work, you can call via cmd.exe and call chcp before calling powershell.exe, along the lines of cmd /c "chcp 65001 >NUL & powershell -c ..."; chcp 65001 sets the console code pages to 65001, i.e. UTF-8.

(This introduces extra overhead, but a cmd.exe process is relatively light-weight compared to a powershell.exe process, and so is chcp.com).

Here's a sample command you can run from PowerShell to demonstrate:

& {

  # Save the current code pages.
  $prevInCp, $prevOutCp = [Console]::InputEncoding, [Console]::OutputEncoding

  # Write the UTF-8 encoded form of string 'kö' to a temp. file.
  # Note: In PowerShell (Core) 7+, use -AsByteStream instead of -Encoding Byte
  Set-Content temp1.txt -Encoding Byte ([Text.UTF8Encoding]::new().GetBytes('kö'))

  # Switch to UTF-8, pipe the UTF-8 file's content to PowerShell's stdin,
  # verify that it was decoded correctly, and output it, again encoded as UTF-8.
  cmd /c 'chcp 65001 >NUL & type temp1.txt | powershell -nop -c "$stdinLine = @($input)[0]; $stdinLine -eq ''kö''; Write-Output $stdinLine" > temp2.txt'

  # Read the temporary file as UTF-8 and echo its content.
  Get-Content -Encoding Utf8 temp2.txt

  # Clean up.
  Remove-Item temp[12].txt
  # Restore the original code pages.
  [Console]::InputEncoding = $prevInCp; [Console]::OutputEncoding = $prevOutCp

}

This outputs the following, indicating that the powershell call both correctly read the UTF-8-encoded input and also output it as UTF-8:

True
ö

Note:

You can bypass character encoding problems by using the in-process PowerShell SDK as an alternative to creating a powershell.exe child process, though I don't know how painful that is from C++. For a C# example, see this answer.

Sign up to request clarification or add additional context in comments.

10 Comments

I tested it (putting the SetConsoleOutputCP(CP_UTF8) in the first line of the main) but still, the wrong characters are shown. Is there a specific place I have to put this line?
I tried the method with AllocConsole() but this doesnt change anything. Showing a console in the background is not what I want but better than nothing. I am not sure what you ment by calling via cmd? My GUI (made with GTK+ [gtkmm]) may call multiple powershell sub-processes. So, I can't just call one powershell session. Is there no known method to send utf-8 via pipe to powershell?
@SimonPio. You can bypass character encoding problems by using the in-process PowerShell SDK, though I don't know how painful that is from C++ - for a C# example, see this answer. As for AllocConsole() - you may have to tweak the start-up info properties in the subsequent CreateProcess call, but I'm only guessing. As for cmd.exe: I meant that every time you need to call powershell.exe, don't call it directly, call it via cmd.exe, which allows you to execute chcp 65001 first, which sets the console code page(s) to UTF-8.
I tried it with a console only application and SetConsoleOutputCP(CP_UTF8). This did not work either. I also tried to create a cmd process with CreateProcess(...) in this console application and passed a utf-8 string std::string test("chcp 65001 & powershell -c mkdir C:\..\ägiüdjöfj\n" with WriteFile(...). Output was wrong again. Maybe I also have to fix the issue for passing utf-8 to cmd? Gonna be a encoding-ception...
@SimonPio., the output encoding may also be off if your PowerShell CLI call calls an external program that doesn't play by the rules. I can't tell from your code what PowerShell command you're trying to execute.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.