PHP file_get_contents

Question

I'm looking to create a PHP script where, a user will provide a link to a webpage, and it will get the contents of that webpage and based on it's contents, parse the contents.

For example, if a user provides a YouTube link:

http://www.youtube.com/watch?v=xxxxxxxxxxx

Then, it will grab the basic information about that video (thumbnail, embed code?)

Or they might provide a vimeo link:

 http://www.vimeo.com/xxxxxx

Or even if they were to provide any link, without a video attached, such as:

 http://www.google.com/

And it could grab just the page Title or some meta content.

I'm thinking I'd have to use file_get_contents, but I'm not exactly sure how to use it in this context.

I'm not looking for someone to write the entire code, but perhaps provide me with some tools so that I can accomplish this.

Try to ask a more straight forward question, like "how do I get the thumbnails of a movie in youtube using PHP" It might make people more responsive. — Itay Moav -Malimovka
– Itay Moav -Malimovka, Commented Sep 5, 2009 at 20:17

Community · Accepted Answer · 2023-11-17 19:24:28Z

3

You can use either the curl or the http library. You send a http request, and can use the library to get the information from the http response.

edited Nov 17, 2023 at 19:24

CommunityBot

11 silver badge

answered Sep 5, 2009 at 20:16

txwikinger

3,0441 gold badge27 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

yoda Over a year ago

in addition, you can use regex to parse the information you want ftom those websites.

Luis Serrano · Accepted Answer · 2011-09-01 06:53:03Z

I know this question is quite old, but I'll answer just in case someone hits it looking for the same thing.

Use oEmbed (http://oembed.com/) for YouTube, Vimeo, Wordpress, Slideshare, Hulu, Flickr and many other services. If not in the list or you want to make it more precise, you can use this:

http://simplehtmldom.sourceforge.net/

It's a sort of jQuery for PHP, meaning you can use HTML selectors to get portions of the code (i.e.: all the images, get the contents of a div, return only text (no HTML) contents of a node, etc).

You could do something like this (could be done more elegantly but this is just an example):

    require_once("simple_html_dom.php");
function getContent ($item, $contentLength) 
{
    $raw;
    $content = "";
    $html;
    $images = "";

    if (isset ($item->content) && $item->content != "")
    {
        $raw = $item->content;
        $html = str_get_html ($raw);            
        $content = str_replace("\n", "<BR /><BR />\n\n", trim($html->plaintext));

        try
        {
            foreach($html->find('img') as $image) {
                if ($image->width != "1") 
                {
                    // Don't include images smaller than 100px height
                    $include = false;
                    $height = $image->width;
                    if ($height != "" && $height >= 100)
                    {
                        $include = true;
                    }
                    /*else
                    {
                        list($width, $height, $type, $attr) = getimagesize($image->src);
                            if ($height != "" && $height >= 100)
                                $include = true;
                    }*/                 

                    if ($include == true)
                    {
                        $images = $images . '<div class="theImage"><a href="'.$image->src.'" title="'.$image->alt.'"><img src="'.$image->src.'" alt="'.$image->alt.'" class="postImage" border="0" /></a></div>';
                    }
                }
            }
        }
        catch (Exception $e) {
            // Do nothing
        }

        $images = '<div id="images">'.$images.'</div>';
    }
    else
    {
        $raw = $item->summary;
        $content = str_get_html ($raw)->plaintext;
    }

    return (substr($content, 0 , $contentLength) . (strlen ($content) > $contentLength ? "..." : "") . $images);
}

Adam Franco · Accepted Answer · 2009-09-05 21:49:09Z

1

file_get_contents() would work in this case assuming that you have allow_fopen_url set to true in your php.ini. What you would do is something like:

$pageContent = @file_get_contents($url);
if ($pageContent) {
    preg_match_all('#<embed.*</embed>#', $pageContent, $matches);
    $embedStrings = $matches[0];
}

That said, file_get_contents() won't give you much in the way of error handling other receiving the content on success or false on failure. If you would like to have more rich control over the request and access the HTTP response codes, use the curl functions and in particular, curl_get_info, to look at the response codes, mime types, encoding, etc. Once you get the content via either curl or file_get_contents() your code for parsing it to look for the HTML of interest will be the same.

answered Sep 5, 2009 at 21:49

Adam Franco

87.5k5 gold badges39 silver badges39 bronze badges

1 Comment

Greg Over a year ago

After a call to file_get_contents using the HTTP wrapper (so opening a URL), the variable $http_response_header will be populated with the response-headers

André Hoffmann · Accepted Answer · 2009-09-05 20:35:11Z

0

Maybe Thumbshots or Snap already have some of the functionality you want?

I know that's not exactly what you are looking for, but at least for the embedded stuff that might be handy. Also txwikinger already answered your other question. But maybe that helps ypu anyway.

answered Sep 5, 2009 at 20:35

André Hoffmann

3,5311 gold badge27 silver badges39 bronze badges

Collectives™ on Stack Overflow

PHP file_get_contents

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related