What's the text encoding used for header values on HTTP requests?

Question

I have a Ruby on Rails application that is a server for Java and .Net apps. I have a custom header I'm using to send some data, but when this data reaches the Ruby on Rails app, Rails reads the value as UTF-8 then says the value is not a valid UTF-8 string.

For instance, if I send JÜRGENELITE-HP I get:

#<ActiveRecord::StatementInvalid: PGError: ERROR:  invalid byte sequence for encoding "UTF8": 0xdc52
: SELECT * FROM "replicas" WHERE ("replicas"."identification" = 'J?RGENELITE-HP') AND ("replicas".user_id = 121)  LIMIT 1>

The Java HTTP Client library clearly prints the data correctly in the console:

DEBUG [main] (DefaultClientConnection.java:268) - >> POST /ze/api/files.json HTTP/1.1
DEBUG [main] (DefaultClientConnection.java:271) - >> X-Replica: JÜRGENELITE-HP
DEBUG [main] (DefaultClientConnection.java:271) - >> Authorization: Basic bWxpbmhhcmVzOjEyMzQ1Njc4

DEBUG [main] (DefaultClientConnection.java:271) - >> Content-Length: 0
DEBUG [main] (DefaultClientConnection.java:271) - >> Host: localhost:3000
DEBUG [main] (DefaultClientConnection.java:271) - >> Connection: Keep-Alive
DEBUG [main] (DefaultClientConnection.java:271) - >> User-Agent: Apache-HttpClient/4.1.2 (java 1.5)

But when it reaches Rails it breaks. What encoding does HTTP uses to encode header values?

You may be able to leverage the solution here to work around what you're trying to do. — Rob Hruska
– Rob Hruska, Commented Apr 27, 2012 at 19:43

Rob Hruska · Accepted Answer · 2012-04-27 19:55:22Z

US-ASCII

If you look at section 2.2 of RFC2616:

2.2 Basic Rules

The following rules are used throughout this specification to
describe basic parsing constructs. The US-ASCII coded character set
is defined by ANSI X3.4-1986 [21].

   OCTET          = <any 8-bit sequence of data>
   CHAR           = <any US-ASCII character (octets 0 - 127)>
   UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
   LOALPHA        = <any US-ASCII lowercase letter "a".."z">
   ALPHA          = UPALPHA | LOALPHA
   DIGIT          = <any US-ASCII digit "0".."9">
   CTL            = <any US-ASCII control character
                    (octets 0 - 31) and DEL (127)>
   CR             = <US-ASCII CR, carriage return (13)>
   LF             = <US-ASCII LF, linefeed (10)>
   SP             = <US-ASCII SP, space (32)>
   HT             = <US-ASCII HT, horizontal-tab (9)>
   <">            = <US-ASCII double-quote mark (34)>

The remainder of the section has more specific information about headers and other elements of the protocol.

You have to jump around the spec quite a bit to find all of the right BNF definitions. Section 4.2 contains the definition for headers, though:

   message-header = field-name ":" [ field-value ]
   field-name     = token
   field-value    = *( field-content | LWS )
   field-content  = <the OCTETs making up the field-value
                    and consisting of either *TEXT or combinations
                    of token, separators, and quoted-string>

TEXT is defined back in Section 2.2:

   TEXT           = <any OCTET except CTLs,
                    but including LWS>

Collectives™ on Stack Overflow

What's the text encoding used for header values on HTTP requests?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related