Hehe. :-) This has to do with the incrementally growing support for Unicode in the last dozen of Perl releases and the regex feature \C used by the URI module, to be more precise, by URI::Escape. Read this thread on perl-unicode from 2010 (Don't use the \C escape in regexes - Why not?) to understand the background.
Why the URI module? Because it is used to do form and URL encoding by HTTP::Request::Common.
Meanwhile, here's a script I wrote to remind myself of how tricky this issue is, especially as the URI module is such a frequently used one:
use 5.010;
use utf8;
# Perl and URI.pm might behave differently when you encode your script in
# Latin1 and drop the utf8 pragma.
use Encode;
use URI;
use Test::More;
use constant C3A8 => 'text=%C3%A8';
use constant E8 => 'text=%E8';
diag "Perl $^V";
diag "URI.pm $URI::VERSION";
my $chars = 'è';
my $octets = encode 'iso-8859-1', $chars;
my $uri = URI->new('http:');
$uri->query_form( text => $chars );
is $uri->query, C3A8, C3A8;
my @exp;
given ( "$^V $URI::VERSION" ) {
when ( 'v5.12.3 1.56' ) { @exp = ( E8, C3A8 ) }
when ( 'v5.10.1 1.54' ) { @exp = ( C3A8, C3A8 ) }
when ( 'v5.10.1 1.58' ) { @exp = ( C3A8, C3A8 ) }
default { die 'not tested :-)' }
}
$uri->query_form( text => $octets );
is $uri->query, $exp[0], $exp[0];
utf8::upgrade $octets;
$uri->query_form( text => $octets );
is $uri->query, $exp[1], $exp[1];
done_testing;
So what I get (on Windows and Cygwin) is:
C:\Windows\system32 :: perl \Opt\Cygwin\tmp\uri.pl
# Perl v5.12.3
# URI.pm 1.56
ok 1 - text=%C3%A8
ok 2 - text=%E8
ok 3 - text=%C3%A8
1..3
And:
MiLu@Dago: ~/comp > perl /tmp/uri.pl
# Perl v5.10.1
# URI.pm 1.54
ok 1 - text=%C3%A8
ok 2 - text=%C3%A8
ok 3 - text=%C3%A8
1..3
UPDATE
You can handcraft the request body:
use utf8;
use Encode;
use LWP::UserAgent;
my $chars = 'ölè';
my $octets = encode( 'iso-8859-1', $chars );
my $body = 'text=' .
join '',
map { $o = ord $_; $o < 128 ? $_ : sprintf '%%%X', $o }
split //, $octets;
my $uri = 'http://localhost:8080/';
my $req = HTTP::Request->new( POST => $uri, [], $body );
print $req->as_string;
my $ua = LWP::UserAgent->new;
my $rsp = $ua->request( $req );
print $rsp->as_string;
text=%E8: i.sstatic.net/rM3xS.pngncon port 8080, and I gettext=%C3%A8. Specs: MacOS X 10.6, perl v5.10.0, libwww-perl/5.837.