1

Im currently facing the issue where

<a href="<a href="http://www.freeformatter.com/xml-formatter.html#ad-output" target="_blank">http://www.freeformatter.com/xml-formatter.html#ad-output</a>">Links</a>

Is being returned from a service I am using. As you can see this is NOT valid html. Does anyone know any tools or regular expressions that can help me remove the inner tag to change it to this:

<a href="http://www.freeformatter.com/xml-formatter.html#ad-output">Links</a>

EDIT: The service does not always return freeformatter.com website. It could return ANY website

4
  • have you tried anything till now? Commented Jun 18, 2014 at 20:46
  • Report it to the service provider. Commented Jun 18, 2014 at 20:46
  • Ive been trying to use Java.split tool and manually changing it but my solutions seem overly complicated and clunky. Will report the service provider but dont really have time to wait for them to make their change Commented Jun 18, 2014 at 20:50
  • is jsoup good for parsing invalid html? Commented Jun 18, 2014 at 20:51

4 Answers 4

1

If the URL or content within the tags changes you'll want to use a more generalized pattern perhaps:

(<a\\shref=\"\\w.+\")\\s.+>\"(.+</a>)

This essentially captures the portions of the string you want into two groups; which can then be reassembled into one string. Here's a working example:

http://ideone.com/TbOvVa

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this is exactly what i was looking for. Not very strong with regex so this helps a lot.
0

In Java:

String s = "<a href=\"<a href=\"http://www.freeformatter.com/xml-formatter.html#ad-output\" target=\"_blank\">http://www.freeformatter.com/xml-formatter.html#ad-output</a>\">Links</a>;

(You'll need to save it as a String somehow in your program)

Then:

s = s.replace("<a href=\"", "");
String[] pcs = s.split("http://www.freeformatter.com/xml-formatter.html#ad-output</a>\">");
s = pcs[0] + pcs[1];
s = s.replace(" target=\"_blank\"", "");

You would have the right ref after all this processing.

1 Comment

oh sorry i wasn't clear. It could be www.xxx. Doesnt have to be freeformatter.com
0

grab the first a href=" with .substring(0,8) then use .split("\">",1) and use the resulting array at index 1.

Comments

0

Solution 1

Simply use the grouping feature of the regex that is captured by parenthesis (). Get the matched group using Matcher.group() method.

Find all the occurrence between > and < and combine it as per your need.

Here is the regex pattern >([^\">].*?)<. Have a look at the demo on debuggex and regex101

Pattern description:

.       Any character (may or may not match line terminators)
[^abc]  Any character except a, b, or c (negation)
X*?     X, zero or more times (Reluctant quantifiers)
(X)     X, as a capturing group

Read more about

Sample code:

String string = "<a href=\"<a href=\"http://www.freeformatter.com/xml-formatter.html#ad-output\" target=\"_blank\">http://www.freeformatter.com/xml-formatter.html#ad-output</a>\">Links</a>";

Pattern p = Pattern.compile(">([^\">].*?)<");
Matcher m = p.matcher(string);

while (m.find()) {
    System.out.println(m.group(1));
}

output:

http://www.freeformatter.com/xml-formatter.html#ad-output
Links

Solution 2

Try with String#replaceAll() method using (</a>)[^$]|([^^]<a(.*?)>) regex pattern.

Pattern says: Replace all the </a> that is not in the end and <a.*?> that is not in the beginning with the double quotes.

Find demo on regex101 and debuggex

Pictorial representation of this regex pattern:

enter image description here

Sample code:

String string = "<a href=\"<a href=\"http://www.freeformatter.com/xml-formatter.html#ad-output\" target=\"_blank\">http://www.freeformatter.com/xml-formatter.html#ad-output</a>\">Links</a>";

System.out.println(string.replaceAll("(</a>)[^$]|([^^]<a(.*?)>)", "\""));

output:

<a href="http://www.freeformatter.com/xml-formatter.html#ad-output">Links</a>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.