0
    $nomadspage = "http://www.nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/";    
    $html = file_get_contents($nomadspage);
    $count = preg_match_all('/<a href="([^"]+)">[^<]*<\/a>/i', $html, $files);

    unset($files[1]); //deletes repeat array from preg_match
    $files = $files[0]; //deletes container array from preg_match

    foreach ($files as $key => $value) {
        if (substr($value, 0, 3) !== "gfs") {
            unset($files[$key]);
        }
    }

    var_dump($files);

I have an array with file names from an HTTP directory. I want to filter these files names so that all of the files that don't start with the three letters gfs are deleted from the array. However, for some reason, the substr() function does not work. It does not pull a substring from the file names. Therefore, the if statement does not work. Anybody know why this is happening and how to fix it?

2
  • Can you give us a subset of the $files array? Commented Feb 2, 2017 at 0:46
  • 1
    Must be a preg_match_all() issue. Like, you're not getting the results you think. Perhaps you should use DOMDocument, when you're parsing HTML anyways. Better yet, I'm sure that NOAA information is available as a JSON response, somewhere. Commented Feb 2, 2017 at 0:47

1 Answer 1

3

$files[0] contains the strings that match the entire regular expression, so substr($value, 0, 3) is always "<a ". You should set $files to $files[1], not $files[0], it contains all the matches of the ([^"]+) pattern.

Actually, it's best not to use regular expressions to parse HTML. Use a DOM parser library, such as the DOMDocument class.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.