0

This is the HTML file that i tried to parse with libxml

<html>
<head>
         <title>Hello World Page</title>
         <link rel="stylesheet" type="text/css" href="http://csszengarden.com/214/214.css?v=8may2013">
</head>
<body>
    <h3>Hello World</h3>
    <br>
    <p>Questo e un paragrafo.</p>
    <a src="/">LINK</a>
</body>
</html>

and this is the example program that I have took from the libxml parsing tutorial.

#include <stdio.h>
#include <libxml/parser.h>
#include <libxml/tree.h>

static void print_element_names(xmlNode * a_node);

int main()
{
  xmlDoc         *doc = NULL;
  xmlNode        *root_element = NULL;
  const char     *Filename = "file.xml";
  doc = xmlReadFile(Filename, NULL, 0);

  if (doc == NULL) printf("error: could not parse file %s\n", Filename);
  else
  { root_element = xmlDocGetRootElement(doc);
    print_element_names(root_element);
    xmlFreeDoc(doc); }
  xmlCleanupParser();
  return (0);
}

static void print_element_names(xmlNode * a_node)
{
  xmlNode *cur_node = NULL;
  for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
      if (cur_node->type == XML_ELEMENT_NODE)
          printf("node type: Element, name: %s\n", cur_node->name);
      print_element_names(cur_node->children);
  }
}

return me this series of error

file.xml:5: parser error : Opening and ending tag mismatch: link line 4 and head
    </head>
           ^
file.xml:11: parser error : Opening and ending tag mismatch: br line 8 and body
    </body>
           ^
file.xml:12: parser error : Opening and ending tag mismatch: body line 6 and html
</html>
       ^
file.xml:12: parser error : Premature end of data in tag head line 2
</html>
       ^
file.xml:12: parser error : Premature end of data in tag html line 1
</html>
       ^
error: could not parse file file.xml

I'm a noob of libxml and i would generate and extract data from a tree based on the HTML file. What i have to modify in the program for parse the HTML code?

0

1 Answer 1

1

xmlReadFile parses XML files. You have an HTML file, not an XML file. To parse an HTML file, use htmlReadFile instead[1].


  1. Bug in the documentation saying it parses XML notwithstanding.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.