5

looking for a php solution that will find a match the following expression:

  1. URL contains "http://" (not necessarily begins with http://) AND
  2. URL ends with a file extension from an array.

Example of file extension array

$filetypes = array(
jpg,
gif,
png,
js,
tif,
pdf,
doc,
xls,
xlsx,
etc);

Here is the working code I wish to update with the above requirements:

Right now, this code works and returns only URL's that contain "http://" but i want to include the second requirement as well.

$i = 0;
$matches = false;
foreach($all_urls as $index => $value) {
    if (preg_match('/http:/', $value)) {
        $i++;
        echo "[{$i}] {$value}<br>";
        $matches = true;
    }
}
2
  • Can you paste your sample url ? Commented May 2, 2015 at 13:21
  • 'http://mattressandmore.com/wp-content/themes/barberry/js/fresco.js' - I just realised the file extension may end with apostrophe as well Commented May 2, 2015 at 13:24

2 Answers 2

5

You can just do an in_array() call in your if statement where you check with pathinfo() if the extension is in the $filetypes array.

$i = 0;
$matches = false;
foreach($all_urls as $index => $value) {
    if (preg_match('/http:/', $value) && in_array(pathinfo($value, PATHINFO_EXTENSION ), $filetypes)) {
        $i++;
        echo "[{$i}] {$value}<br>";
        $matches = true;
    }
}

EDIT:

As you said in the comments that a few url's contains single quotes you can just use this to get rid of them as @Ghost showed it in the comments:

trim($value, "'")

Then use it in the in_array() call as followed:

in_array(pathinfo(trim($value, "'"), PATHINFO_EXTENSION ), $filetypes)
                //^^^^^^^^^^^^^^^^^
Sign up to request clarification or add additional context in comments.

14 Comments

thanks for such a quick response, ill give it a go and report back
@user3436467 before we know it, Rizier has already completed your project piece by piece :D lol
@user3436467 you can just trim('string', "'") those strings under the loop
yes, its part of the string but only for some URL's.. not all of them.. some bad web developers adding in the apostrophers :/
@Rizier123, you are seriously awesome! hats off to you for being quick and thorough! thanks Ghost also for the assistance
|
1

An easier solution would be using just a simple regex:

$i = 0;
$matches = false;
foreach($all_urls as $index => $value) {
    if (preg_match("/^http:\/\/.+\.(jpg|gif|png|js|tif|pdf|doc|xls|xlsx|etc)$/", $value)) {
        $i++;
        echo "[{$i}] {$value}<br>";
        $matches = true;
    }
}

This will ensure the match starts with http:// (due to the ^) and ends with the .jpg or likewise (due to the or'ed list and $).

If you want to support https you could just use:

/^https?:\/\/.+\.(jpg|gif|png|js|tif|pdf|doc|xls|xlsx|etc)$/

1 Comment

Oh just read you're description again. Why is it, it shouldn't start with http? Is it not a requirement to end with the extension either then?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.