I have an HTTP response header with an ISO-8859-1 character in it (é, 0x39). Here is a raw packet capture showing the HTTP response, second line from the bottom:
00000000 48 54 54 50 2f 31 2e 30 20 32 30 30 20 4f 4b 0d HTTP/1.0 200 OK.
00000010 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 61 .Content -Type: a
00000020 75 64 69 6f 2f 61 61 63 70 0d 0a 69 63 79 2d 62 udio/aac p..icy-b
00000030 72 3a 34 30 0d 0a 69 63 79 2d 67 65 6e 72 65 3a r:40..ic y-genre:
00000040 4a 61 7a 7a 20 4c 6f 75 6e 67 65 20 43 61 66 65 Jazz Lou nge Cafe
00000050 0d 0a 69 63 79 2d 6e 61 6d 65 3a 43 61 66 e9 20 ..icy-na me:Caf.
00000060 64 65 20 50 61 72 69 73 20 2d 20 52 41 44 49 4f de Paris - RADIO
The header should be:
icy-name:Café de Paris
I am making a very simple http.get() request:
http.get('http://example.com/streamUrl', function (res) {
console.log(res.headers);
});
On my console, I see:
'icy-name': 'Caf� de Paris',
Then I tried converting the string to a buffer:
console.log(new Buffer(res.headers['icy-name']));
// <Buffer 43 61 66 ef bf bd 20 64 65 20 50 61 72 69 73 ... >
It seems that the original character 0xe9 is already lost before conversion to my Buffer, or during the conversion process. Then, I thought that maybe iconv-lite would be helpful:
var iconv = require('iconv-lite');
iconv.extendNodeEncodings();
console.log( (new Buffer(res.headers['icy-name'], 'latin1')).toString('utf8') );
// "Caf? de Paris" with a literal question mark, `0x3F`.
I suspect the damage is already done before my code can ever get the response header values. My questions:
- Is my assumption correct that the string is mis-interpreted by Node.js' HTTP client in the first place?
- Is there anyway to configure Node.js to properly handle HTTP responses? RFC5987 says that the default character set for HTTP responses is ISO-8859-1.
- If there is no way to get Node.js to behave, is there any way to undo the conversion damage, recover the original ISO-88591-1 string, and then convert to UTF-8?