2

I am looking for the javascript method analogous to PHP's DOMDocument->loadHTMLFILE(), so that I can parse an external html file's contents and extract images. Right now i'm doing it via ajax, which is too slow.

Here is the PHP i use to scrape images, it works. I simply want to do the same thing but browser side so that its faster.

if(isset($_POST['link']) && $_POST['link'] !== ""){
    //extract relevant article info from link
    $sourceArray = array();
    $sizeArray = array();
    $link = $_POST['link'];
    //generate new DOMdoc
    $article = new DOMDocument;
    $article ->loadHTMLFile($link);
    //get the largest image
    $images = $article->getElementsByTagName("img");
    foreach($images as $image){
        $source = $image->getAttribute("src");
        if(strpos($source, "http://") !== false){
            $sizeProfile = getimagesize($source);
            $imgArea = $sizeProfile[0] * $sizeProfile[1];
            if($imgArea > 100){
                array_push($sizeArray, $imgArea);
                array_push($sourceArray, $source);
            }
        }
    }
    array_multisort($sizeArray, SORT_DESC, $sourceArray);
    $sourceHTML = "";
    $i = 0;
    foreach($sourceArray as $source){
        $id = 'image'.$i;
        $sourceHTML .= '<img id="'.$id.'" class="notSelectedPicture" src="'.$source.'" onclick="toggleSelectedPicture(\''.$id.'\');" alt="alt">';
        $i++;
    }
    echo $sourceHTML;
    exit();
}
6
  • 2
    ajax is pretty much your only option, if your JavaScript code is running as part of some web code in a browser. Commented Nov 5, 2013 at 23:37
  • You may be implementing your AJAX method incorrectly. If you are finding your current AJAX method too slow, then you should try to write your own, browser specific AJAX method. Look into the XMLHttpRequest (XHR) API. Commented Nov 5, 2013 at 23:40
  • i don't see where ajax comes into play for parsing html, but checkout developer.mozilla.org/en-US/docs/Web/API/… Commented Nov 5, 2013 at 23:40
  • @Pointy it would seem that your right, the document.implementation.createHTMLDocument("New Document"); method creates a new dom doc but i still cant load an entire external html doc into it as with the php loadHTMLFILE() method. i am using my own super light ajax method which works quickly for other implementations, its just that the dom parsing (getting all images, sorting by size, checking for full path ref) takes time and having to echo the result back through ajax.responseText just adds to the time. since JS jas document.images(), i wanted to use that instead. Commented Nov 6, 2013 at 13:57
  • Have you considered the use of PHP's glob() function? Commented Nov 6, 2013 at 23:42

1 Answer 1

1

The ajax solution works for this purpose. As a client-side language JS does not seem to be capable of getting and parsing external html files in the way that PHP is. In order to cut down on loading time, one should focus on the efficiency of the dom parsing code that the ajax posts to.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.