libxml HTML parsing error in C

Question

This is the HTML file that i tried to parse with libxml

<html>
<head>
         <title>Hello World Page</title>
         <link rel="stylesheet" type="text/css" href="http://csszengarden.com/214/214.css?v=8may2013">
</head>
<body>
    <h3>Hello World</h3>
    <br>
    <p>Questo e un paragrafo.</p>
    <a src="/">LINK</a>
</body>
</html>

and this is the example program that I have took from the libxml parsing tutorial.

#include <stdio.h>
#include <libxml/parser.h>
#include <libxml/tree.h>

static void print_element_names(xmlNode * a_node);

int main()
{
  xmlDoc         *doc = NULL;
  xmlNode        *root_element = NULL;
  const char     *Filename = "file.xml";
  doc = xmlReadFile(Filename, NULL, 0);

  if (doc == NULL) printf("error: could not parse file %s\n", Filename);
  else
  { root_element = xmlDocGetRootElement(doc);
    print_element_names(root_element);
    xmlFreeDoc(doc); }
  xmlCleanupParser();
  return (0);
}

static void print_element_names(xmlNode * a_node)
{
  xmlNode *cur_node = NULL;
  for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
      if (cur_node->type == XML_ELEMENT_NODE)
          printf("node type: Element, name: %s\n", cur_node->name);
      print_element_names(cur_node->children);
  }
}

return me this series of error

file.xml:5: parser error : Opening and ending tag mismatch: link line 4 and head
    </head>
           ^
file.xml:11: parser error : Opening and ending tag mismatch: br line 8 and body
    </body>
           ^
file.xml:12: parser error : Opening and ending tag mismatch: body line 6 and html
</html>
       ^
file.xml:12: parser error : Premature end of data in tag head line 2
</html>
       ^
file.xml:12: parser error : Premature end of data in tag html line 1
</html>
       ^
error: could not parse file file.xml

I'm a noob of libxml and i would generate and extract data from a tree based on the HTML file. What i have to modify in the program for parse the HTML code?

ikegami · Accepted Answer · 2014-02-12 14:12:37Z

1

xmlReadFile parses XML files. You have an HTML file, not an XML file. To parse an HTML file, use htmlReadFile instead^[1].

Bug in the documentation saying it parses XML notwithstanding.

edited Feb 12, 2014 at 14:12

answered Feb 11, 2014 at 19:32

ikegami

391k17 gold badges291 silver badges555 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

libxml HTML parsing error in C

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related