2

Prerequisites: hunspell and php5.

Test code from bash:

user@host ~/ $ echo 'sagadījās' | hunspell -d lv_LV,en_US
Hunspell 1.2.14
+ sagadīties

- works properly.

Test code (test.php):

$encoding = "lv_LV.utf-8";

setlocale(LC_CTYPE, $encoding); // test
putenv('LANG='.$encoding); // and another test

$raw_response = shell_exec("LANG=$encoding; echo 'sagadījās' | hunspell -d lv_LV,en_US");

echo $raw_response;

returns

Hunspell 1.2.14
& sagad 5 0: tagad, sagad?ties, sagaudo, sagand?, sagar?o
*
*

Screenshot (could not post code with invalid characters): Hunspell php invalid characters

It seems that shell_exec cannot handle utf-8 correctly, or maybe some additional encoding/decoding is needed?

EDIT: I had to use en_US.utf-8 to get valid data.

2
  • Have you tried proc_open()? Seems to me like writing the data directly to the process' STDIN would be more reliable than bouncing it through the shell... Commented Apr 5, 2012 at 13:01
  • 1
    @DaveRandom same output. But I just checked - mb_detect_encoding(stream_get_contents($pipes[1])) returns ASCII. That could be the problem. Commented Apr 5, 2012 at 13:14

1 Answer 1

5

Try this code:

<?php

  // The word we are checking
  $subject = 'sagadījās';

  // We want file pointers for all 3 std streams
  $descriptors = array (
    0 => array("pipe", "r"),  // STDIN
    1 => array("pipe", "w"),  // STDOUT
    2 => array("pipe", "w")   // STDERR
  );

  // An environment variable
  $env = array(
    'LANG' => 'lv_LV.utf-8'
  );

  // Try and start the process
  if (!is_resource($process = proc_open('hunspell -d lv_LV,en_US', $descriptors, $pipes, NULL, $env))) {
    die("Could not start Hunspell!");
  }

  // Put pipes into sensibly named variables
  $stdIn = &$pipes[0];
  $stdOut = &$pipes[1];
  $stdErr = &$pipes[2];
  unset($pipes);

  // Write the data to the process and close the pipe
  fwrite($stdIn, $subject);
  fclose($stdIn);

  // Display raw output
  echo "STDOUT:\n";
  while (!feof($stdOut)) echo fgets($stdOut);
  fclose($stdOut);

  // Display raw errors
  echo "\n\nSTDERR:\n";
  while (!feof($stdErr)) echo fgets($stdErr);
  fclose($stdErr);

  // Close the process pointer
  proc_close($process);

?>

Don't forget to verify that the encoding of the file (and therefore the encoding of the data you are passing) actually is UTF-8 ;-)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for feedback. mb_detect_encoding randomly (per char/word) returned ASCII and utf-8. After a while I tried to set LANG variable to en_US.utf-8 and it worked. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.