I want to delete HTML tags(that are defined in an array) in a string.My approach:
public String cleanHTML(String unsafe,String[] blacklist){
String safe = "";
for(String s:blacklist){
safe =unsafe.replaceAll("\\<.{0,1}"+s+".*?>", "");
}
return safe;}
To test my function I use the following main method:
public static void main(String a[]){
StringParser sp = new StringParser();
String[] blacklist = new String[]{"img","a"};
System.out.println( sp.cleanHTML("<p class='p1'>paragraph</p><img></img>< this is not html > <A HREF='#'>Link</A><a link=''>another link</a> <![CDATA[<sender>John Doe</sender>]]>",blacklist));
}
Output:
<p class='p1'>paragraph</p><img></img>< this is not html > <A href='#'>Link</A> <![CDATA[<sender>John Doe</sender>]]>another link
As you can see it only replaces the "another link" part.So I basically have two questions:1.)how can I get my regex to replace every < a > regardless if its lower or upper case and 2.) how can I get my code to delete every blacklisted tag,not only the last one in the array?
Thanks in advance.
unsafe.replaceAlldoes not modifyunsafe.