1

I am trying to encode the 'subject' field, written in Hebrew, of an email into Base64 so that the subject can be read correctly in all browsers. At the moment, I am using the encoding Windows-1255 which works on some clients but not all, so I want to use utf-8, base64.

My reading on the subject (no pun intended) shows that the text has to be in the form

=?<charset>?<encoding>?<encoded text>?=

eg

=?windows-1255?Q?=E0=E1?=

I have taken encoded subject lines from letters which were sent to me in Hebrew with UTF-8B encoding and decoded them successfully on this website, www.webatic.com/run/convert/base64.php. I have also used this website to encode simple letters and have noticed that the return encoding is not the same as the result which I get from a Delphi algorithm.

So - I am looking for an algorithm which successfully encodes letters such as aleph (ord=224), bet (ord=225), etc. According to the website, the string composed of the two letters aleph and bet returns the code 15DXkq==, but the basic Delphi algorithm returns Ue4 and the TIdEncoderQuotedPrintable component returns =E0=E1 (which is the ISO-8859 encoding).

Edit (after several comments):

I asked a friend to send me an email from her Mac computer, which unsurprisingly uses UTF-8 encoding (as opposed to Windows-1255). The subject was one letter, aleph, ord 224. The encoded subject appeared in the email's header as follows

=?UTF-8?B?15A=?=

This can be separated into three parts: the 'prefix' (=?UTF-8?B?) which means that UTF-8 with base64 encoding is being used; the 'payload' (15A=), which the web site which I quoted translates this correctly as the letter aleph; and the suffix (?=).

I need an algorithm to translate an arbitrary string of letters, most of which will be in Hebrew (and thus with ord >= 224) into base64/utf-8; a correct solution is one that decodes correctly on the web site quoted.

10
  • 1
    Convert to UTF-8 with UTF8Encode (I think). And then pass through a base 64 encoder. There's one in the EncdDecd unit. Commented Jan 12, 2013 at 12:17
  • 2
    If you are using Indy components, you can use TIdMessage to create the mail, it will take care of these sort of details... Commented Jan 12, 2013 at 12:24
  • @DavidHeffernan: encode64 (utf8encode (aleph)) -> rv0. Interesting idea, but not right. Commented Jan 12, 2013 at 15:48
  • @whosrdaddy: I am using TIdMessage, but I have to encode the subject line according to my needs. Commented Jan 12, 2013 at 15:57
  • 2
    Well, I guess you need to work out what you actually want to do. In the question you say you want to convert from active ANSI code page to UTF-8 and then encode in base 64. But in the comments you say that you don't want to do that. Very hard for us to answer if you aren't clear on what you want. Commented Jan 12, 2013 at 19:32

2 Answers 2

1

I'm sorry to have wasted all your time. I spent several hours again on the subject today and discovered that the base64 code which I was using has a huge bug.

The steps necessary to send a base64 encoded UTF-8 subject line are:

  1. Convert 'normal' text (ie local ANSI code page) to UTF-8 via the AnsiToUTF8 function
  2. Encode this into base64
  3. Create a string with the prefix '=?UTF-8?B?', the result from stage 2 and the suffix '=?='
  4. Send!

Here is the complete code for creating and sending the email (obviously simplified)

 with IdSMTP1 do
  begin
   host:= ....;
   username:= ....;
   password:= ....;
  end;

 with email do
  begin
   From.Address:= ....;
   Recipients.EMailAddresses:= ....;
   cclist.add.address:= ....;
   email.subject:= '=?UTF-8?B?' + encode64 (AnsiToUTF8 (edit1.text)) +  '=?=';
   email.Body.text:= ....;
  end;

 try
  IdSMTP1.Connect (1000);
  IdSMTP1.Send (email);
 finally
  if IdSMTP1.Connected
   then IdSMTP1.Disconnect;
 end;

Using the code on this page which is the same as this page, the 'codes64' string begins with the digits, then capital letters, then lower case letters and then punctuation. But this page shows that the capital letters should come first, followed by the lower case letters, followed by the digits, followed by the punctuation.

Once I had made this correction, the strings began to be encoded 'correctly' - I could read them properly in my email client, which I am taking to be the definition of 'correct'.

It would be interesting to read whether anybody else has had problems with the base64 encoding code which I found.

Sign up to request clarification or add additional context in comments.

4 Comments

You do not need to encode the Subject property manually at all. TIdMessage encodes it automatically for you. Simply assign the Edit1.Text value as-is to the Subject and let TIdMessage encode it as needed. If you want to customize how TIdMessage encodes it, use the TIdMessage.OnInitializeISO event to provide the desired charset and encoding values. Let Indy do the hard work for you. email.Subject := edit1.text; procedure TForm1.emailInitializeISO(var VHeaderEncoding: Char; var VCharSet: string); begin VHeaderEncoding := 'B'; VCharSet := 'UTF-8'; end;
@RemyLebeau: Maybe this is so in Indy 10. At the time, I was using Indy 9 and there were no clues as to how to encode. It was an interesting - albeit frustrating experience. Now I have upgraded to Indy 10 and will use your code. Thank you.
TIdMessage has an OnInitializeISO event in Indy 9 as well, and it does do automatic header encoding as well. However, Indy 9 does not support Unicode, so it is possible that Ansi data in Indy 9 does not get encoded the same way that Ansi data in Indy 10 does.
@RemyLebeau: please post your code about OnInitializeISO in your first comment as an answer so that I can accept it and close the question.
0

You do not need to encode the Subject property manually at all. TIdMessage encodes it automatically for you. Simply assign the Edit1.Text value as-is to the Subject and let TIdMessage encode it as needed.

If you want to customize how TIdMessage encodes headers, use the TIdMessage.OnInitializeISO event to provide the desired charset and encoding values. In Delphi 2009+, it defaults to UTF-8 and Base64. In earlier versions, TIdMessage reads the RTL's current OS language and chooses some default values for known languages. However, Hebrew is not one of them, and so ISO-8859-1 and QuotedPrintable would end up being used. You can override those values, eg:

email.Subject := Edit1.Text;

.

procedure TForm1.emailInitializeISO(var VHeaderEncoding: Char; var VCharSet: string);
begin
  VHeaderEncoding := 'B';
  VCharSet := 'UTF-8';
end;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.