I have an application in Perl/CGI where I receive a utf8 txt file and treat its content.
For some reason (I think that Perl divides the file into 4096 bytes buffers and only the first one has the Byte Order Mark) Perl interprets the content of the file as Unicode after 4096 bytes.
If I spread some en dashes ("–") in the middle of the file (at least one for each block of 4k) the program recognizes it as utf8, probably because Unicode doesn't have en dashes.
I'm receiving the txt from an html page and sending it to an scalar variable like this:
while(my $l = <$fh>){
$text .= $l;
}
I tried to force utf8 by concatenating each line of the file with an en dash:
while(my $l = <$fh>){
$text .= "–".$l;
}
But I get this error:
Wide character in print at (eval 12) line 94.
Does anyone have a tip? has Thank you!