0

I have an xml file that is streamed to an xml parser.

The content of the xml file contains html tags, which I would like to ignore:

    <overview>
      <p>Situated on a peninsula halfway up the west coast of India, Mumbai (formerly Bombay) is India's economic powerhouse and home to more millionaires than any other city on the Indian sub-continent.</p>
      <p>The Portuguese established this old Hindu city as a colony in 1509.</p>
      <p>Like many Indian cities, the streets of Mumbai are congested with cattle, carts and motor vehicles and the air is thick with smog.</p>
</overview>

The method to parse the overview is:

private String readOverview(XmlPullParser parser) throws IOException, XmlPullParserException{
        parser.require(XmlPullParser.START_TAG, ns, TAG_OVERVIEW);
        String overview = readText(parser);
        parser.require(XmlPullParser.END_TAG, ns, TAG_OVERVIEW);
        return overview;
    }

The error is: expected: END_TAG {null}overview (position:START_TAG <p>@6:10 in java.io.InputStreamReader@537c80f4).

1
  • 1
    Can't you parse those tags instead? Commented Feb 18, 2015 at 7:28

3 Answers 3

1

That error is occurring because the parser is reading unmatched tags.

I needed my parser to read unmatched HTML tags without throwing an error, and this is what worked for me:

    parser.setFeature("http://xmlpull.org/v1/doc/features.html#relaxed", true);

This worked for me on emulators as far back as 4.1.1 (JellyBean).

If you want to ignore the HTML tags, the CDATA option is a better solution.

Sign up to request clarification or add additional context in comments.

Comments

0

If you can CDATA tags to your XML file. Then you should be able to ignore the HTML tags.

Reference: XML Cdata - Explains it well

Comments

0

The trick is to understand how the XmlPullParser works

Once you understand it you can implement a function that finds the <p> tags and handles them as required. In this case making a List<String>

Example:

//Extract Tags
private List<String> readHtml(XmlPullParser parser) throws IOException, XmlPullParserException {

    List<String> result = new ArrayList<String>();
    //Required Tag is in calling function

    //holder for current line
    String curr_line = "";
    //get current tag name
    String current_tag_name = "";

    //while an end tag is not found
    while (parser.next() != XmlPullParser.END_TAG){
        //if a start tag is found continue
        if (parser.getEventType() != XmlPullParser.START_TAG){
            continue;
        }
        //get current tag
        current_tag_name = parser.getName();
        if (current_tag_name.equals(TAG_P)){
            curr_line = readText(parser);
        }
        else{
            skip(parser);
        }
        if (curr_line != null){
            result.add(curr_line);
        }
    }
    return result;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.