2

In my project I am required to read mails and save its content in hard drive, from a MS Exchange email box using javamail. But I found that even the simplest email I receive is saved with html content, like head body and so on, even when I only write two words with format, without images, no attachment. But I just want the text of email.

Part of code:

Object content = part.getContent();
if (content instanceof InputStream || content instanceof String) {
        if (Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition()) || 
            StringUtils.isNotBlank(part.getFileName())) {
    String messageBody = part.getContent().toString();
....(write this string to files)
    }  
}

I may write:

Hello world.

And I get a txt with all its html code, and fontface and tags like <html> and so on.

I saw this question and I found him only retrieving text content but I cannot comment there, so I must post a new question, and I see no difference between my code and his. He wrote:

if (disposition != null && (disposition.equals(BodyPart.ATTACHMENT))) {


    DataHandler handler = bodyPart.getDataHandler();

    s1 = (String) bodyPart.getContent();`

So is it about the DataHandler? But it is not used anywhere? Can someone help?

1 Answer 1

1

First of all, you'll want to read this JavaMail FAQ entry that tells you how to find the main message body. As written, it prefers an html body over a plain text body in cases where the message contains both. It should be clear how to reverse that preference.

But, not all messages will contain both html and plain text versions of the message body. If you get only html, you're going to have to write your own code to process the string and remove the html tags, or use some other product to process the html and remove the tags.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for comment, but I cannot see why the order means something in the link you posted. And change the order of if - else changes the preference and the output? Can you specify a little more?
Per RFC 2046, which defines multipart/alternative, the alternatives appear in order of increasing faithfulness to the original content. That means you'll find text/plain before text/html. If you prefer text/plain, you can change that code to return as soon as it finds text/plain content; there's no need to continue looking for other body parts.
Ok thanks. I decide to retrieve whole message as html because it contains more information. I prefer maintain the structure of emails and not mess up all the text.
At last I used Jsoup and it works fine. The trick is, you have to remove the <head> part manually first, and Jsoup does the rest.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.