0

I am trying to read and display the content of the title (contained in a h1 tag) from many HTML files. These files are all in the same folder.

This is what the html files look like :

<!DOCTYPE html PUBLIC '-//W3C//DTD HTML 4.01//EN'>
<html> 
<head>   
    <title>A title</title> 
    <style type='text/css'>
    ... Styles here ...
    </style>
</head>
<body>
  <h1>&Ecirc;tre aidant</h1>
  <p>En g&eacute;n&eacute;ral, les aidants doivent &eacute;quilibrer...</p>
  ... more tags ...
</body>

I have tried to display the content from the H1 tag with this PHP script :

<?php 
foreach (glob("test/*.html") as $file) {
    $file_handle = fopen($file, "r");

    $doc = new DOMDocument();
    $doc->loadHTMLfile($file);

    $title = $doc->getElementsByTagName('h1');
    if ( $title && 0<$title->length ) {
        $title = $title->item(0);
        $content = $doc->savehtml($title);
        echo $content;
    }
    fclose($file_handle);
}
?>

But the output contains wrong characters. For the example file, the output is :

Être aidant

How can I achieve this output?

Être aidant
1

2 Answers 2

1

You should state a charset in the <head> of your HTML document.

<meta charset="utf-8">
Sign up to request clarification or add additional context in comments.

3 Comments

Do you mean in the output document?
Yep, for all your .html files, you should have this declaration. Try it for one file and see if it works or not.
ok, I added <head><meta charset="utf-8"></head> before the output and the output is as I need, thx for the help!
0

you need to use utf-8 encoding change echo $content to echo utf8_encode($content);

2 Comments

thanks for the answer but the output becomes Être aidant
if you remove utf8_encode than output will be Être aidant. what is your expected output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.