ody:
I’m building a Make.com scenario like this:
HTTP (fetch website HTML) → Text parser (extract elements) → Filter "only good links" → Array aggregator → further processing
Goal I want only “good” links to pass, for example:
/karriere, /stellenangebote, /jobs
/about, /team
/leistungen, /services
I want to exclude everything else, like:
anchors #
mailto:, , javascript:
file extensions like .png, .css, .js, .pdf
asset folders /wp-content/, /images/, /static/
The problem When I use the Text parser module with element type = a, the output shows:
Element = full <a …>… string
Inner content = link text or inner HTML
Attributes = always empty
That means Attributes.href is never available. So my filter on href never works – nothing passes through, even though the HTML clearly contains valid links.
Example HTML observed in the parser output:
So far I tried:
Adding a filter directly after the Text parser, using Attributes.href → always empty
Building regex conditions, but since href is not extracted, they don’t apply
What I need help with:
Is it best practice in Make to filter directly on the Element (the full tag) using regex, instead of trying to use Attributes.href?
Or should I first extract the href into a new variable and then filter on that variable?
How do I handle both relative URLs (like /karriere) and absolute URLs (like https://…/karriere) before the aggregator?
Any Make-specific advice on where to place the filter so I don’t end up collecting thousands of irrelevant links?
Expected Links such as career, jobs, about, team, services etc should pass through.
Actual Currently nothing passes because Attributes.href is always empty.
(?:(?!\1).)*?If it finds it, replace the match with nothing ''. If you have a list of exclusion text, post it and I'll put it in there / show you how.(?si)<a(?=\s(?:[^>"']|"[^"]*"|'[^']*')*?(?<=\s)href\s*=\s*(?:(['"])((?:(?!\1).)*?)\1(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)*?>))\s+(?:".*?"|'.*?'|[^>]?)+>regex101.com/r/nN9WwB/1(['"])(?:(?!\1).)*?(?:/karriere|/stellenangebote|/jobs|/about|/team|/leistungen|/services)(?:(?!\1).)*?\1or by failing(['"])(?:(?!\1).)*?(?:anchors\s*\#|mailto:|javascript:|/wp-content/|/images/|/static/|\.(?:jpg|png|jpeg|gif|pdf|js|css)\b)(?:(?!\1).)*?\1Just put these in the placeholder above. Then use that regex on each of your filterred a tag's