So far I have used the preg_match_all function with various expressions, but I am not good at regex.
I have a string (downloaded html page). Of course, there are a lot of things on it. Including assets. I need to extract all valid domains and IPv4 addresses from this string.
If it is possible from a regular expression: I would also like to remove the rest of the address and query. However, if this is not possible, I can remove it in later processing.
This expression for domains works quite well, although it could be better, because it also catches garbage like "/html/style/global.css.php". And does not work on IP addresses
preg_match_all('#[-a-zA-Z0-9@:%_\+.~\#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~\#?&//=]*)?#si', $response->body(), $match);
{2,256}?x.comis a valid domain.-and.. Not@,~,&, etc.