Make.com Text parser: Attributes.href is empty — how to filter <a> links by href (relative + absolute) before aggregating?

Ask Question

Asked 2 months ago

Modified 2 months ago

Viewed 68 times

ody:

I’m building a Make.com scenario like this:

HTTP (fetch website HTML) → Text parser (extract elements) → Filter "only good links" → Array aggregator → further processing

Goal I want only “good” links to pass, for example:

/karriere, /stellenangebote, /jobs

/about, /team

/leistungen, /services

I want to exclude everything else, like:

anchors #

mailto:, , javascript:

file extensions like .png, .css, .js, .pdf

asset folders /wp-content/, /images/, /static/

The problem When I use the Text parser module with element type = a, the output shows:

Element = full <a …>… string

Inner content = link text or inner HTML

Attributes = always empty

That means Attributes.href is never available. So my filter on href never works – nothing passes through, even though the HTML clearly contains valid links.

Example HTML observed in the parser output:

So far I tried:

Adding a filter directly after the Text parser, using Attributes.href → always empty

Building regex conditions, but since href is not extracted, they don’t apply

What I need help with:

Is it best practice in Make to filter directly on the Element (the full tag) using regex, instead of trying to use Attributes.href?

Or should I first extract the href into a new variable and then filter on that variable?

How do I handle both relative URLs (like /karriere) and absolute URLs (like https://…/karriere) before the aggregator?

Any Make-specific advice on where to place the filter so I don’t end up collecting thousands of irrelevant links?

Expected Links such as career, jobs, about, team, services etc should pass through.

Actual Currently nothing passes because Attributes.href is always empty.

asked Sep 7 at 14:40

Alex Lombardo

111 bronze badge

1

Please provide enough code so others can better understand or reproduce the problem.

Community
– Community Bot

2025-09-07 16:42:45 +00:00
Commented Sep 7 at 16:42
Do you just need to delete <a> links that do not pass your criteria ? Ie. keep it if it passes your tests ?

sln
– sln

2025-09-07 18:37:07 +00:00
Commented Sep 7 at 18:37
There is a little section in the middle where what you don't want is matched. (?:(?!\1).)*? If it finds it, replace the match with nothing ''. If you have a list of exclusion text, post it and I'll put it in there / show you how. (?si)<a(?=\s(?:[^>"']|"[^"]*"|'[^']*')*?(?<=\s)href\s*=\s*(?:(['"])((?:(?!\1).)*?)\1(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)*?>))\s+(?:".*?"|'.*?'|[^>]?)+> regex101.com/r/nN9WwB/1

sln
– sln

2025-09-07 18:48:20 +00:00
Commented Sep 7 at 18:48
It can be done either by passing (['"])(?:(?!\1).)*?(?:/karriere|/stellenangebote|/jobs|/about|/team|/leistungen|/services)(?:(?!\1).)*?\1 or by failing (['"])(?:(?!\1).)*?(?:anchors\s*\#|mailto:|javascript:|/wp-content/|/images/|/static/|\.(?:jpg|png|jpeg|gif|pdf|js|css)\b)(?:(?!\1).)*?\1 Just put these in the placeholder above. Then use that regex on each of your filterred a tag's

sln
– sln

2025-09-07 19:20:41 +00:00
Commented Sep 7 at 19:20

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Make.com Text parser: Attributes.href is empty — how to filter <a> links by href (relative + absolute) before aggregating?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest