1

Here is an Example of what i want to do Example:

<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>

From the above emaple I would like to scrape data and tags in arrays. In the result I would like an array containing: arr = [h1,p,h2]; and another array: arr2 = [This is h1,This is paragraph,This is h2]

0

4 Answers 4

2

Assuming the elements are known you could use the domdocument's getelementsbytagname like this:

$html = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";
$doc = new DOMDocument();
$doc->loadhtML($html);
$elements = array();
$content = array();
function iterate_elements($array, $doc){
     global $elements, $content;
     foreach($array as $element){
          $the_element = $doc->getElementsByTagName($element);
          foreach($the_element as $target){
               $content[] = $target->textContent;
               //$target->tagName;         
          }
          if(!empty($the_element->length)) {
               $elements[] =  $element;
         }
     }
}
iterate_elements(array('h1','p', 'h2'), $doc);
print_r($elements);
print_r($content);

Demo: https://eval.in/825860

Sign up to request clarification or add additional context in comments.

Comments

1
$str = <<<EOF
<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>
EOF;

$html = str_get_html($str);

foreach($html->find('.room *') as $el){
  $arr[] = $el->tag;
  $arr2[] = $el->text();
}

Comments

1

Try this;

$str = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";

$arr = explode(PHP_EOL, $str);

$res =array();
Foreach($arr as $row){
    If(!strpos($row, "div") !== False){
        $res[substr($row, 1, strpos($row, ">")-1)] = strip_tags($row); 
    }
}

Var_dump($res);

https://3v4l.org/8TkIT

It loops through one line at the time and creates the array with named keys.

Edit if there is more than one room you can make it multidimensional like this:
https://3v4l.org/DdXVd

$str = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>
<div class='room2'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";

$arr = explode(PHP_EOL, $str);

$res =array();
Foreach($arr as $row){
    If(strpos($row, "div") !== False){
        $pos1 = strpos($row, "'")+1;
        $room = substr($row, $pos1, strpos($row, "'", $pos1)-$pos1);
    }Else{
        $pos1 = strpos($row, "<")+1;
        $res[$room][substr($row, strpos($row, "<")+1, strpos($row, ">")-$pos1)] = trim(strip_tags($row)); 
    }
}

Var_dump($res);

Comments

1

try below code.

$html = "<div class='room'>
<h1>This is a h1</h1>
<p>This is a Paragraph</p>
<h2>This is h2</h2>
</div>";

$dom = new SimpleXMLElement( $html );

$values = array_filter( array_values( (array) $dom ), function ( $i ) { return ! is_array( $i ); } );
$keys = array_filter( array_keys( (array) $dom ), function ( $i ) { return $i != '@attributes'; } );

print_r( $values ); // This is a h1, This is a Paragraph, This is h2
print_r( $keys ); // h1, p, h2

I used array_filter for remove div tag from result.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.