From the MongoDB manual:
By default, all database strings are UTF8. To save images, binaries, and other non-UTF8 data, you can pass the string as a reference to the database.
I'm fetching pages and want store the content for later processing.
- I can not rely on meta-charset, because many pages has utf8 content but wrongly declaring iso-8859-1 or similar
- so can't use
Encode(don't know the originating charset) - therefore, I want store the content simply
as flow of bytes(binary data) for later processing
Fragment of my code:
sub save {
my ($self, $ok, $url, $fetchtime, $request ) = @_;
my $rawhead = $request->headers_as_string;
my $rawbody = $request->content;
$self->db->content->insert(
{ "url" => $url, "rhead" => \$rawhead, "rbody" => \$rawbody } ) #using references here
if $ok;
$self->db->links->update(
{ "url" => $url },
{
'$set' => {
'status' => $request->code,
'valid' => $ok,
'last_checked' => time(),
'fetchtime' => $fetchtime,
}
}
);
}
But get error:
Wide character in subroutine entry at /opt/local/lib/perl5/site_perl/5.14.2/darwin-multi-2level/MongoDB/Collection.pm line 296.
This is the only place where I storing data.
The question: The only way store binary data in MondoDB is encode them e.g. with base64?
$rawheadand$rawbodyto the sample given in the manual (i.e.,"\xFF\xFE\xFF")?