I'm not sure if this is a Perl issue, an Nginx issue, or an HTTP issue. I know there are a bazillion questions about character encoding, but I just can't figure this out. Anyway, here's the problem.
My web site pulls data from two different type of sources. Some of those sources are utf-8 files. Some of them are files which contain URL encoded data. The problem is that I can't figure out how to output the characters from both of those sources without getting funky characters in the web browser.
The following Perl script demonstrates the problem. You can see this script live and in action at https://www.mikobiko.com/demo.pl
#!/usr/bin/perl -wT
use strict;
use CGI;
# variables
my ($in, $from_file, $from_url);
# HTTP header
print qq|Content-type: text/html; charset=utf-8\n\n|;
# from utf-8 file
open($in, '<', './utf-8.txt');
$in or die $!;
($from_file) = <$in>;
print "<h1>from utf-8 file</h1>\n";
print "<p>character: ", $from_file, "</p>\n";
print '<p>length: ', length($from_file), "</p>\n";
# from url encoded
print "<h1>from url encoded</h1>\n";
$from_url = '%F1';
$from_url = CGI::unescape($from_url);
print "<p>character: ", $from_url, "</p>\n";
print '<p>length: ', length($from_url), "</p>\n";
Here's what this script does. It outputs a standard Content-type header, including indicating that the character set is utf-8.
Then it slurps in a utf-encoded file that contains the character ñ (an "n" with a tilde over it). Then it outputs that character. You can see the source file itself at https://www.mikobiko.com/utf-8.txt . Here's the linux "file" command output for that file:
utf-8.txt: UTF-8 Unicode text, with no line terminators
Then the script decodes the URL character string for ñ, then outputs that.
Here's a screen shot of what the browser shows. This screenshot is from Chrome, but Firefox does the same thing. The character that is from the utf-8 file is displayed with the little question mark symbol.
If I remove the "charset=utf-8" part of the Content-type, then the problem is reversed, and url decoded character is displayed funky.
Here's some system info:
nginx: nginx/1.10.3 (Ubuntu)
Perl: perl 5, version 22, subversion 1 (v5.22.1)
Linux on the server:
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
Please let me know if there's any other info I can provide to help solve this problem. Thanks!
