2

I am trying to get a list of files from a directory thorough PHP. I also tried via glob, but doesn't work with HTTP, tried recursively and this is the latest script I managed to found. Just that it doesn't work. it doesn't display the files.

<?php
$url = 'removed for security puposes';
$html = file_get_contents($url);
$count = preg_match_all('/<td><a href="([^"]+)">[^<]*<\/a><\/td>/i', $html, $files);
for ($i = 0; $i < $count; ++$i) {
  echo "File: " . $files[1][$i] . "<br />\n";
}
var_dump($files);
?>

The var_dump($files); is output

array(2) { 
     [0]=> array(0) {
      } 
     [1]=> array(0) 
      { } 
} 

So what am I mistaking.

14
  • Don't use regular expressions to parse HTML. Use a proper HTML parsing module. You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See htmlparsing.com/php or this SO thread for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. Commented Apr 9, 2014 at 14:20
  • @Jurik Well, don;t know where to look, I mean everywhere I look is only for the local path (scandir, glob....) Commented Apr 9, 2014 at 14:20
  • 1
    @Jurik Then explain to him briefly Commented Apr 9, 2014 at 14:20
  • 1
    Beside this lack of basic knowledge, I would suggest you to open $url in your browser and take a quick look at the source code. Hint: td != li Commented Apr 9, 2014 at 14:20
  • 1
    @PoomrokcThe3years I thought this is a Q&A portal and not a "I do not know what I am doing, please teach me" portal. Additionally I really appreciate his approach with regular expressions - it's always a good thing to do this from time to time and increase regexp skill :) Commented Apr 9, 2014 at 14:27

3 Answers 3

4

on your page are lists, not tables

   <?php
   $url = 'http://www.seoadsem.com/opencart';
   $html = file_get_contents($url);
   $count = preg_match_all('/<li><a href="([^"]+)">[^<]*<\/a><\/li>/i', $html, $files);
   for ($i = 0; $i < $count; ++$i) {
     echo "File: " . $files[1][$i] . "<br />\n";
   }
   var_dump($files);
   ?>
Sign up to request clarification or add additional context in comments.

Comments

1

For security reasons, file_get_contents might not be working for URLs, only files. Please use cURL instead. This may save you a lot of debugging time.

See PHP cURL vs file_get_contents.

1 Comment

Wonder what could be that "security reason"
1
<?php
    $url = 'removed for security puposes';
    $html = file_get_contents($url);
    $count = preg_match_all('/<a href="([^"]+)(png|jpg|mp4|\/)">[^<]*<\/a>/i', $html, $files);
    for ($i = 0; $i < $count; ++$i) {
       echo "File: " . $files[1][$i] . $files[2][$i] . "<br />\n";
    }
    var_dump($files);
 ?>

png, jpg, mp4 can be replaced by extensions you need.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.