I use a regular expression to get all SRC paths in HTML code.
'/src="http?:\/\/[^"]*\.(jpg|jpeg|png)"/i'
How I can add https to expression?
I tried:
'/src="http|https?:\/\/[^"]*\.(jpg|jpeg|png)"/i'
Just add s next to p in http and make it as optional by adding the quantfier ? next to the s
'/src="https?:\/\/[^"]*\.(jpg|jpeg|png)"/i'
http? in your regex makes p as optional. Don't do that.
If you want to make the whole http:// or https:// as optional, you need to put them inside a capturing or non-capturing group and then make it as optional by adding the quantifier ? next to that added group.
'/src="(?:https?:\/\/)?[^"]*\.(jpg|jpeg|png)"/i'
$matches = array();
preg_match_all('#src="([^"]*)"#i', $text, $matches);
print_r($matches[1]);
Remarks:
/) and then quote all the slashes that appear in the regexp. In PHP the forward slashes are not mandatory, other characters can be used as delimiters as well. When you want to parse URLs you better use something else as delimiter (~ for example, is very good for URLs) and the regexp will become easier to read because the forward slashes do not need to be quoted any more.use a regular expression to get all SRC paths in HTML code. Not all the URLs start with http:// or https://. There are also ftp://, mailto:, data: and others. More, even if you want only the http or https URLs, in HTML they can be expressed as relative to the current document (i.e. something like src="image1.jpg") or current host (src="/images/2.jpg"). All these are extracted by the code above. Run through the list, check the first characters of each result (to find out if they are absolute or relative), calculate the complete URLs from relative URLs and keep only what you need (probably only http:// and https://).