1

I'm trying to assess a string based on the suffix of the files that it contains.

I need to differentiate between strings that contain only image files (.png,.gif, .jpg,.jpeg, or .bmp) and strings which contain a mixture of image and non-image files.

What am I doing wrong?

if (preg_match('~\.(png\)|gif\)|jpe?g\)|bmp\))~', $data->files)) {
  echo 'image only;'
} else {
  echo 'image + other types';
}

Example string containing a mixture:

filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx)

Example string containing only images:

filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg)
3
  • probably you need '/\\.(png|gif|jpe?g|bmp)$/' Commented Feb 23, 2018 at 2:18
  • 2
    you do not need regex for that, sometimes the extensions with jgp, bmp might not be valid image as well Commented Feb 23, 2018 at 2:36
  • @sumit while this task can be accomplished without regex, it is the most sensible and direct tool to assess the strings. Commented Feb 24, 2018 at 3:53

3 Answers 3

4

The regular expression is wrong. You have ) after each extension. This will work:

~\.(png|gif|jpe?g|bmp)~i

Complete example:

<?php
if (preg_match('~\.(png|gif|jpe?g|bmp)~i', "https://example.com/test.png")) {
  echo 'image only';
}
else {
  echo 'image + other types';
}

Demo

With the corrected regex, now you can check if the batch of files contains only images, images and files, or only files. We already got the first part down (checking if there are images). With this regex, we can check if there's non-images:

/^(?!.*[.](png|gif|jpe?g|bmp))(?:.*$|,)/im

It uses a negative lookahead to assert that the extensions are not matched in the line. At the end there's a non-capturing group to check for the end of line or a comma (to comply to your format).

So finally, check both regular expressions and see what each batch really contains:

$files=[
    'Non-Images Only'=>'filename 1 (https://example.com/test.exe)',
    'Mixed-Type'=>'filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)',
    'Images-Only'=>'filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))'];
foreach ($files as $type => $batch) {
    echo "Batch: ".$batch.PHP_EOL;
    echo "Expecting: ".$type.PHP_EOL;
    $images = preg_match('/\.(png|gif|jpe?g|bmp)/im', $batch);
    $nonImages = preg_match('/^(?!.*[.](png|gif|jpe?g|bmp))(?:.*$|,)/im', $batch);
    $result = "";
    if ($images && $nonImages) {
        $result = "Mixed-Type";
    }
    else {
        if ($images) {
            $result = "Images-Only";
        }
        else {
            $result = "Non-Images Only";
        }
    }
    echo "Result: ".$result.PHP_EOL;
    echo PHP_EOL;
}

Note: used @mickmackusa's list of tests

Demo

Sign up to request clarification or add additional context in comments.

11 Comments

@mickmackusa hadn't seen this. Can you further clarify the question? The regex successfully checks if the string contains an image extension (if file is actually an image is another question...)
@mickmackusa I see. Will give it a go.
Thanks for updating. Now, your answer is twice as slow as my pattern while having 4x as many upvotes than mine. This will surely confuse future researchers. Would you mind upvoting my answer?
You need to back up your claim :). Mine answer, your answer. They look to be just about the same. And how will the upvotes confuse researches when the accepted answer is yours? I certainly don't mind upvoting your answer though I don't personally like people asking for upvotes.
I've run out of fingers and toes.
|
1

After reading and re-reading your question more than 20 times, I think I know what you are trying to do.

For every string (batch of files), I run two preg_match() checks. One that seeks files with a suffix of png,gif,jpg,jpeg, or bmp. Another that seeks files that DO NOT have a suffix in the aforementioned list.

*note: (*SKIP)(*FAIL) is a technique used to match and immediately disqualify characters in a pattern.

Code: (PHP Demo) (Image Pattern Demo) (Non-Image Pattern Demo)

$tests=[
    'Non-Images Only'=>'filename 1 (https://example.com/test.exe)',
    'Mixed-Type'=>'filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)',
    'No Files'=>'filename 1 (),
filename 2 ()',
    'Images-Only'=>'filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))'];

$image_pattern='~\.(?:png|gif|jpe?g|bmp)\),?$~im';
$non_image_pattern='~\.(?:(?:png|gif|jpe?g|bmp)(*SKIP)(*FAIL)|[^.)]+)\),?$~im';

foreach($tests as $type=>$string){
    echo "\t\tAssessing:\n---\n";
    echo "$string\n---\n";
    echo "Expecting: $type\n";
    echo "Assessed as: ";
    $has_image=preg_match($image_pattern,$string);
    $has_non_image=preg_match($non_image_pattern,$string);
    if($has_image){
        if($has_non_image){
            echo "Mix of image and non-image files";
        }else{
            echo "Purely image files";
        }
    }else{
        if($has_non_image){
            echo "Purely non-image files";
        }else{
            echo "No files recognized";
        }
    }
    echo "\n----------------------------------------------------\n";
}

Output:

        Assessing:
---
filename 1 (https://example.com/test.exe)
---
Expecting: Non-Images Only
Assessed as: Purely non-image files
----------------------------------------------------
        Assessing:
---
filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)
---
Expecting: Mixed-Type
Assessed as: Mix of image and non-image files
----------------------------------------------------
        Assessing:
---
filename 1 (),
filename 2 ()
---
Expecting: No Files
Assessed as: No files recognized
----------------------------------------------------
        Assessing:
---
filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))
---
Expecting: Images-Only
Assessed as: Purely image files
----------------------------------------------------

2 Comments

thank you so much. i’m so sorry that i couldn’t explained well. English is not my main language. I’m getting the data via Airtable’s api. Somehow I have to differentiate if the field contains non image links.
No worries. I'm happy to help.
1

You're escaping your brackets, so they're getting treated literally.

The regex you're looking is simply: ~(\.png|gif|jpe?g|bmp)$~

if (preg_match('~(\.png|gif|jpe?g|bmp)$', $data->files)) {
  echo 'image only;'
}
else {
  echo 'image + other types';
}

Note that the $ at the end to denote the end of the string is critical; without it, any part of the string would be a valid match. As such, a file such as .jpg.exe would be considered an 'image'.

Running the regex (\.png|gif|jpe?g|bmp)$ against the strings:

https://example.com/test.pdf
https://example.com/other-file.docx
https://example.com/cool_image.jpg.exe
https://example.com/cool_image.jpg

Shows that only the final link will match.

This can be seen working here.

Note that you'll also probably want to throw the i modifier on the end of your regex to allow for file extensions in uppercase as well. This can be done with ~(\.png|gif|jpe?g|bmp)$~i.

9 Comments

"Your regex should be wrapped in forward slashes (/) rather than tildes (~)" why?
I've removed that. I'm used to other languages where the delimeter matters, but according to PCRE it's fine to have a tilde :)
Are you talking about JS? I never noticed one was limited to using /. In PCRE you can use almost anything you'd like: "A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character".
@ObsidianAge may I see a case-insensitive pattern modifier please. (just for the sake of it)
@mickmackusa - In order to have a case-insensitive modifier, you'd just need to throw an i modifier on to the end of the regex. So ~(\.png|gif|jpe?g|bmp)$~i would work :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.