2

I have an HTTP response header with an ISO-8859-1 character in it (é, 0x39). Here is a raw packet capture showing the HTTP response, second line from the bottom:

00000000  48 54 54 50 2f 31 2e 30  20 32 30 30 20 4f 4b 0d HTTP/1.0  200 OK.
00000010  0a 43 6f 6e 74 65 6e 74  2d 54 79 70 65 3a 20 61 .Content -Type: a
00000020  75 64 69 6f 2f 61 61 63  70 0d 0a 69 63 79 2d 62 udio/aac p..icy-b
00000030  72 3a 34 30 0d 0a 69 63  79 2d 67 65 6e 72 65 3a r:40..ic y-genre:
00000040  4a 61 7a 7a 20 4c 6f 75  6e 67 65 20 43 61 66 65 Jazz Lou nge Cafe
00000050  0d 0a 69 63 79 2d 6e 61  6d 65 3a 43 61 66 e9 20 ..icy-na me:Caf. 
00000060  64 65 20 50 61 72 69 73  20 2d 20 52 41 44 49 4f de Paris  - RADIO

The header should be:

icy-name:Café de Paris

I am making a very simple http.get() request:

http.get('http://example.com/streamUrl', function (res) {
    console.log(res.headers);
});

On my console, I see:

'icy-name': 'Caf� de Paris',

Then I tried converting the string to a buffer:

console.log(new Buffer(res.headers['icy-name']));
// <Buffer 43 61 66 ef bf bd 20 64 65 20 50 61 72 69 73 ... >

It seems that the original character 0xe9 is already lost before conversion to my Buffer, or during the conversion process. Then, I thought that maybe iconv-lite would be helpful:

var iconv = require('iconv-lite');
iconv.extendNodeEncodings();
console.log( (new Buffer(res.headers['icy-name'], 'latin1')).toString('utf8') );
// "Caf? de Paris" with a literal question mark, `0x3F`.

I suspect the damage is already done before my code can ever get the response header values. My questions:

  1. Is my assumption correct that the string is mis-interpreted by Node.js' HTTP client in the first place?
  2. Is there anyway to configure Node.js to properly handle HTTP responses? RFC5987 says that the default character set for HTTP responses is ISO-8859-1.
  3. If there is no way to get Node.js to behave, is there any way to undo the conversion damage, recover the original ISO-88591-1 string, and then convert to UTF-8?
3
  • And you're sure all the files are saved as UFT8 etc. Commented Nov 9, 2014 at 23:51
  • @adeneo It doesn't have anything to do with files. I'm talking about the HTTP response headers themselves. The actual data in this response is binary and isn't of concern. Other HTTP clients have been able to handle the response headers successfully. Besides, the data in question is ISO-88591-1, not UTF-8. I need to convert to UTF-8, but I have a feeling that the ISO-88591-1 string isn't handled correctly to begin with. Commented Nov 9, 2014 at 23:52
  • @adeneo I am setting nothing. My Node.js code is a client, not a server. I am accessing headers on another server, elsewhere, that isn't my code, nor built with Node.js. Even if it were, the standards (as far as I can tell) state that HTTP response headers should be ISO-8895-1 by default. You can see from the hex dump that the response data is fine when it is sent over the wire. I also know that other HTTP clients can handle the response headers without error. The source of the data isn't the problem. The problem has to do with Node.js or my application somehow. Commented Nov 10, 2014 at 0:00

2 Answers 2

2

Unfortunately there is no official spec that I am aware of that has been widely implemented for transferring non-ACSII data within HTTP headers. The RFC you have linked to is only in the PROPOSED STANDARD state and is from 2010. It looks like Node 0.10 explicitly passed header values through to basically new String(val) so the values are parsed as utf8 values. It looks like in Node 0.11, the string isn't quite so mangled, so

var iconv = new Iconv('ISO-8859-1', 'UTF-8');
console.log(iconv.convert(new Buffer(res.headers['icy-name'], 'binary')));

does actually work as you expect in 0.11. I can't say for sure if that is intentional, or just a side-effect of other work though.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the note. The only relevant part in the RFC I linked to was in the abstract. It didn't mention specifically where there is a spec saying ISO-8891-1 characters were allowed in headers. I didn't mean to imply that RFC5987 was the standard itself. In any case, I ended up posting this problem on the GitHub issue tracker for Node.js: github.com/joyent/node/issues/8699#issuecomment-62342294 It was confirmed there that the problem was a limitation with V8 used in Node.js v0.10, and that it is fixed in v0.11 and beyond.
1

FWIW I just wrote an Http.Agent available here that will decode header data from latin1 to utf8 and it will overwrite the original headers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.