5

Basically I want to do the same as here which is done in Python. I'd like to replace all self-closed elements to the long syntax.

Example

    <iframe src="http://example.com/thing"/>

becomes

    <iframe src="http://example.com/thing"></iframe>

Full example:

 <html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <link rel="stylesheet" type="text/css" href="/sample.css">
  <title></title>
  <script type="text/javascript" src="/swfobject.js">
                //void
          </script>
  <script type="text/javascript" language="JavaScript" src="/generate.js">
//void
  </script>
  <script type="text/javascript" language="JavaScript" src="/prototype.js">
//void
  </script>
</head>
<body id="mediaPlayer" style="margin:0;padding:0;">
<script type="text/javascript">
                                swfobject.registerObject('id_G12564763');       


                function getFlashObject() {
                        var object;
                        if (navigator.appName == 'Microsoft Internet Explorer' || navigator.userAgent.indexOf("Chrome")!=-1)
                        {
                                object = document.getElementById('id_G12564763');
                        } 
                        else 
                        {
                                object = document['flash_id_G12564763'];
                        }
                        return object;
                }

        </script>
</body>
</html>
2
  • Note that stackoverflow.com/questions/1732348/… describes the inverse of this operation. Commented Aug 12, 2010 at 15:12
  • Borealid, I am aware of that. Do you know how to fix this with a parser? As far as I remember the XML has to be well-formed before parsing it with a parser. That is exactly what I need to do. I have tried Tidy, but that did not work and the project is not maintained any more. This is a small html output that will simply have a series of javascript includes and the object embed tag (flash). Commented Aug 13, 2010 at 7:26

3 Answers 3

1

This can be used to replace one tag (code in javascript).

var becomes = "<iframe src='http://example.com/thing'/>".replace(/<(\w*) (.*)\//,'<$1 $2></$1')

The same, in Java.

String becomes = "<iframe src=\"http://example.com/thing\"/>".replaceFirst("<(\\w*) (.*)\\/", "<$1 $2></$1");
Sign up to request clarification or add additional context in comments.

Comments

1

Ok guys. I found a workaround. I hooked the output method to xml where this html comes from and the XSLT engine takes care of closing those open tags for me. Thanks for answers, but if you happen to have a solution for the problem pls, leave your answer and I will mark it as an answer. This could be useful for others.

Comments

1
String resultHtml = inputHtml.replaceAll("(?six)<(\\w+)([^<]*?)/>", "<$1$2></$1>");

and this will properly handle tags that are not terminated like <hr> and <img>

1 Comment

Hmm .. seems to work for me altho your example has nothing that matches the regexp i provided (ie, it has no self-closed elements). I adjusted the modifier to work more correctly with multi-line input - that might help...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.