1

ody:

I’m building a Make.com scenario like this:

HTTP (fetch website HTML) → Text parser (extract elements) → Filter "only good links" → Array aggregator → further processing

Goal I want only “good” links to pass, for example:

/karriere, /stellenangebote, /jobs

/about, /team

/leistungen, /services

I want to exclude everything else, like:

anchors #

mailto:, , javascript:

file extensions like .png, .css, .js, .pdf

asset folders /wp-content/, /images/, /static/

The problem When I use the Text parser module with element type = a, the output shows:

Element = full <a …>… string

Inner content = link text or inner HTML

Attributes = always empty

That means Attributes.href is never available. So my filter on href never works – nothing passes through, even though the HTML clearly contains valid links.

Example HTML observed in the parser output:

So far I tried:

Adding a filter directly after the Text parser, using Attributes.href → always empty

Building regex conditions, but since href is not extracted, they don’t apply

What I need help with:

Is it best practice in Make to filter directly on the Element (the full tag) using regex, instead of trying to use Attributes.href?

Or should I first extract the href into a new variable and then filter on that variable?

How do I handle both relative URLs (like /karriere) and absolute URLs (like https://…/karriere) before the aggregator?

Any Make-specific advice on where to place the filter so I don’t end up collecting thousands of irrelevant links?

Expected Links such as career, jobs, about, team, services etc should pass through.

Actual Currently nothing passes because Attributes.href is always empty.

4
  • 1
    Please provide enough code so others can better understand or reproduce the problem. Commented Sep 7 at 16:42
  • Do you just need to delete <a> links that do not pass your criteria ? Ie. keep it if it passes your tests ? Commented Sep 7 at 18:37
  • There is a little section in the middle where what you don't want is matched. (?:(?!\1).)*? If it finds it, replace the match with nothing ''. If you have a list of exclusion text, post it and I'll put it in there / show you how. (?si)<a(?=\s(?:[^>"']|"[^"]*"|'[^']*')*?(?<=\s)href\s*=\s*(?:(['"])((?:(?!\1).)*?)\1(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)*?>))\s+(?:".*?"|'.*?'|[^>]?)+> regex101.com/r/nN9WwB/1 Commented Sep 7 at 18:48
  • It can be done either by passing (['"])(?:(?!\1).)*?(?:/karriere|/stellenangebote|/jobs|/about|/team|/leistungen|/services)(?:(?!\1).)*?\1 or by failing (['"])(?:(?!\1).)*?(?:anchors\s*\#|mailto:|javascript:|/wp-content/|/images/|/static/|\.(?:jpg|png|jpeg|gif|pdf|js|css)\b)(?:(?!\1).)*?\1 Just put these in the placeholder above. Then use that regex on each of your filterred a tag's Commented Sep 7 at 19:20

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.