XmlException with XmlStringReader vb .net

Question

I'm trying to parser a wrong XML code with XmlStringReader, like this one.

<Page CODE=""L"" page Caption=""Example""><Cell CellType="0"...></Cell></Page>

and with this code, I try to get the value from the cell type attribute in the Cell Tag.

        Using reader As XmlReader = XmlTextReader.Create(New StringReader(l.Label), New XmlReaderSettings With {
                                                     .ValidationType = ValidationType.None,
                                                     .XmlResolver = Nothing})
               While (reader.ReadToFollowing("Cell"))
            reader.MoveToAttribute("CellType")
            Select Case Int32.Parse(reader.Value)
                  ...
            End Select
        End While

So I get the following XmlException

'Caption' is an unexpected token. The expected token is '='

Are there any ways to avoid this exception? or Should I parse the xml before this to fix the attribute wrong written?

Thanks

Jon Skeet · Accepted Answer · 2012-07-26 12:10:15Z

3

Should I parse the xml before this to fix the attribute wrong written?

It's not XML. It's something which looks a bit like XML, but isn't really. Don't try to read non-XML with XML APIs. It will - and should - fail.

Ideally, fix whatever producing the pseudo-XML to start with.

answered Jul 26, 2012 at 12:10

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Humberto Barrientos Gonzalez Over a year ago

Yep, It's not. I read from a database table and also I don't have access to the producing method so I though the other way It's parsing it with Regular Expressions.

Jon Skeet Over a year ago

@HumbertoBarrientosGonzalez: I wouldn't skip straight from XML to Regex. You may well want to write a custom parser, which then converts it to XML on the fly. You'll need to try to find documentation for the format though.

Cody Gray · Accepted Answer · 2012-07-26 12:14:56Z

0

The universal rule of parsers is that they assume the input is valid according to whatever spec the parser is written. In the case of an XML parser, then, it assumes you're passing it valid XML code to parse.

In this case, you're not because XML doesn't allow attributes to have spaces in their names. page Caption is not a valid attribute identifier, so the parser is probably interpreting page as the attribute identifier, treating the space as a delimiter, and wondering what to do with Caption.

You can't just "fix" the exception though. The parser is thoroughly confused, and it's giving up. Even if you could somehow force it to continue, there would be no way to guarantee the validity of the results. It's just like if someone went through a book and removed all of the punctuation. You'd probably put it down in frustration because you couldn't understand it. But if someone forced you to read it anyway, you'd probably end up getting the wrong meaning more often than not. The only way to fix the problem is to give the parser input that it understands.

So, yes, you'll need to ensure that the XML is valid before running it through a parser. Where are you obtaining this XML from? Can you fix the generation process so that it uses valid identifiers and conforms properly to an XML schema?

answered Jul 26, 2012 at 12:14

Cody Gray♦

246k53 gold badges513 silver badges591 bronze badges

1 Comment

Humberto Barrientos Gonzalez Over a year ago

I'm reading from a database table. I can't, I'm coding to convert to XML but I don't know If the best approach.

Collectives™ on Stack Overflow

XmlException with XmlStringReader vb .net

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related