Deleting XML Lines in VBScript

Question

I got the following piece of code in VBScript:

Set xmlDoc = CreateObject("Msxml2.DOMDocument.6.0")   
xmlDoc.Async = "False"
xmlDoc.setProperty "SelectionLanguage", "XPath"

For Each f In fso.GetFolder("C:\Users\Admin\Folder").Files
    If LCase(fso.GetExtensionName(f)) = "xml" Then
    xmlDoc.Load f.Path

        If xmlDoc.ParseError = 0 Then

            'Some code in here

        Else
            WScript.Echo "Parsing error! '" & f.Path & "': " & xmlDoc.ParseError.Reason

        End If
    End If
Next

I'm doing some operations to XML files inside that directory, but i need to do one thing with all these XML files before doing that: Delete Lines. Something like:

@EDIT (Now with NODE1 being the real sample):

    <?xml version="1.0" encoding="UTF-8"?>
    <!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
    <tadig-raex-21:TADIGRAEXIR21 xmlns="https://XXX" xmlns:tadig-raex-21="https://XXX" xmlns:tadig-gen="https://YYY" xmlns:xsi="ZZZ" xsi:schemaLocation="https://XXX tadig-raex-ir21-8.2.xsd">
      <NODE2.1>       
        <NODE2.1.1> Information1 </NODE2.1.1> 
        <NODE2.1.2> Information2 </NODE2.1.2> 
        <NODE2.1.3> Information3 </NODE2.1.3>
      </NODE2.1>
      <NODE2.2>
        <NODE2.2.1>XXX</NODE2.2.1>
      </NODE2.2>
   </tadig-raex-21:TADIGRAEXIR21>

Turning into:

<?xml version="1.0" encoding="UTF-8"?>
      <NODE2.2>
        <NODE 2.2.1> XXX </NODE 2.2.1>
      </NODE2.1>

The XMLs always have 6 lines between the "xml version" node and NODE2.2. What I intend to do is delete these lines (including the ""), and the last line of the archive, that would always be .

I've tried deleting nodes, as some post here in the site, but Xpaths don't work on it if i don't delete these lines. Thats why I need to think in "lines" to delete... Otherwise, it's impossible. I really don't know what is so horrible in these lines that makes my program not finding my paths, but when i exclude them, i can do so.

I think now I have made myself a little bit more clear...

Can someone please help me?

What do you mean by "Delete lines" ? Delete lines of XML in those xml files? — Nico
– Nico, Commented Dec 4, 2013 at 19:17
@CharlieVelez - if you want help wrt to a curious concept like "deleting xml lines" you should provide a (short!) before/input vs. after/desired output sample (instead of code that is irrelevant to the problem). — Ekkehard.Horner
– Ekkehard.Horner, Commented Dec 4, 2013 at 19:23
XML is not line-oriented. You can have several XML elements on one line as well as elements spanning multiple lines. Deleting lines from an XML file just means asking for trouble. — Ansgar Wiechers
– Ansgar Wiechers, Commented Dec 4, 2013 at 21:31
@Ekkehard.Horner , I edited the post. Maybe you can help me out now? — user3045856
– user3045856, Commented Dec 5, 2013 at 15:59

Ekkehard.Horner · Accepted Answer · 2013-12-05 19:55:18Z

If you would start your XML related scripts with a skeleton like:

  Dim goFS   : Set goFS  = CreateObject("Scripting.FileSystemObject")
  Dim sFSpec : sFSpec    = goFS.GetAbsolutePathName("..\testdata\xml\20383899.xml")
  Dim oXDoc  : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
  oXDoc.async = False
  oXDoc.load sFSpec

  If 0 = oXDoc.ParseError Then
     WScript.Echo "ready to process"
  Else
     WScript.Echo oXDoc.parseError.reason
  End If

you'd immediately see that your .XML is not well-formed: "NODE 1.2.3" isn't a name, the NODE2.1 nodes aren't closed, and NODE2.2 can't be closed with /NODE2.1.

So your .XML should look like:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
  <NODE2.1>
    <NODE2.1.1/>
    <NODE2.1.2/>
    <NODE2.1.3/>
  </NODE2.1>
  <NODE2.2>
    <NODE2.2.1> XXX </NODE2.2.1>
  </NODE2.2>
</NODE1>

I'm confident that such well-formed .XML can be modified to your desired result, but I don't understand your specs: should the NODE1 be 'deleted'/the XML reduced to NODE2.2?

Added to eat my pudding:

A bit cheating, but if this code fragment is inserted in the skeleton:

  If 0 = oXDoc.ParseError Then
     WScript.Echo "ready to process"
     Dim sXPath : sXPath    = "/NODE1/NODE2.2"
     Dim ndFnd  : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
     If ndFnd Is Nothing Then
        WScript.Echo sXpath, "not found"
     Else
        Set oXDoc.documentElement = ndFnd
        WScript.Echo oXDoc.xml
     End If
  Else

the result:

<?xml version="1.0"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE2.2>
        <NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>

confirms to (one interpretation of) your specs. If you can't force the XML's author to obey the standards, you should pre-process the bad XML using text/string ops (RegExp, Replace, ...) and then do the transformations in the usual way. (I admit to having no idea wrt to a RegExp that corrects arbitrary 'wrong tag used to close' blunders)

Update I:

To show the feasability of the strategy "transform the garbage to valid XML and process that", I wrote this adhoc script:

Option Explicit

Dim goFS   : Set goFS  = CreateObject("Scripting.FileSystemObject")
Dim sFSpec : sFSpec    = goFS.GetAbsolutePathName("..\testdata\xml\20383899.org.xml")
Dim sAll   : sAll      = goFS.OpenTextFile(sFSpec).ReadAll()
WScript.Echo "-------------------- garbage in"
WScript.Echo sAll

Dim reZapBlanks : Set reZapBlanks = New RegExp
reZapBlanks.Global     = True
reZapBlanks.Pattern    = "(NODE)(\s+)(\d)"
sAll = reZapBlanks.Replace(sAll, "$1$3")
Dim reAddClose : Set reAddClose = New RegExp
reAddClose.Global     = True
reAddClose.Pattern    = "(<NODE2\.1\.\d+)(>)"
sAll = reAddClose.Replace(sAll, "$1/$2")
Dim reVoodoo : Set reVoodoo = New RegExp
reVoodoo.Global     = False
reVoodoo.Pattern    = "(</NODE2\.1>[\s\S]+)(</NODE2\.1>)"
sAll = reVoodoo.Replace(sAll, "$1</NODE2.2>")
WScript.Echo "-------------------- nice XML out"
WScript.Echo sAll

Dim oXDoc  : Set oXDoc = CreateObject("Msxml2.DOMDocument.6.0")
oXDoc.setProperty "SelectionLanguage", "XPath"
oXDoc.async = False
oXDoc.loadxml sAll ' <-- clean XML

If 0 = oXDoc.ParseError Then
   WScript.Echo "ready to process"
   Dim sXPath : sXPath    = "/NODE1/NODE2.2"
   Dim ndFnd  : Set ndFnd = oXDoc.SelectSingleNode(sXPath)
   If ndFnd Is Nothing Then
      WScript.Echo sXpath, "not found"
   Else
      Set oXDoc.documentElement = ndFnd
      WScript.Echo "-------------------- condensed using std XML methods"
      sAll = oXDoc.xml
      WScript.Echo sAll
      oXDoc.loadxml sAll ' <-- condensed XML
      WScript.Echo "-------------------- sanity check"
      WScript.Echo "Error:", oXDoc.ParseError.errorCode
   End If
Else
   WScript.Echo oXDoc.parseError.reason
End If

output:

cscript 20383899.vbs
-------------------- garbage in
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
  <NODE2.1>
    <NODE 2.1.1>
    <NODE 2.1.2>
    <NODE 2.1.3>
  </NODE2.1>
  <NODE2.2>
    <NODE 2.2.1> XXX </NODE 2.2.1>
  </NODE2.1>
</NODE1>

-------------------- nice XML out
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE1>
  <NODE2.1>
    <NODE2.1.1/>
    <NODE2.1.2/>
    <NODE2.1.3/>
  </NODE2.1>
  <NODE2.2>
    <NODE2.2.1> XXX </NODE2.2.1>
  </NODE2.2>
</NODE1>

ready to process
-------------------- condensed using std XML methods
<?xml version="1.0"?>
<!-- Created on 2013-11-19T12:00:57+01:00 with ROAMSYS RMS // www.roamsys.com -->
<NODE2.2>
        <NODE2.2.1> XXX </NODE2.2.1>
</NODE2.2>

-------------------- sanity check
Error: 0

The RegExps are tailored to this specific garbage; I don't claim that the next bad XML can be cleaned in a similar way.

Update II:

The last version of @Charlie's XML input is well-formed. So it can be processed using XML methods (XPATH to find the NODE2.2 node and assignment to .documentElement to reduce/condense the .XML file to that node). So all the above rigmarole isn't needed.

I hope that the history of this question will make everybody think twice, when the uncouth concept of "deleting lines from XML" raises its ugly head.

My mistake, I have wrote it wrongly, as you have noticed... Let me rearrange it. @EDIT: So, you got Xpath to make it find NODE2.2... I CAN'T use a Xpath, it does not function! There is something arbitrary in these beggining lines that won't let NODE2.2 (or his children nodes) be found, because Xpath seems to not understand lines 2-6 in its logic!
@CharlieVelez - in your edits you removed the offending spaces, but did neither close those three nodes nor fix the wrong close. Does that mean the current input version is what you have to work with?
It's all set, now. I just copied wrongly... I'm here coding and just let it wrong without noticing, sorry for the inconvenience
I think now is clearer. I really need to get rid of those lines :/
thank you so much for your patience. I'm almost doing it. The code you did functions to transform the XML, but now I see why I couldn't find some XML's nodes with the code I was doing. I edited the post with the only thing that changes in the XML. Clients always have something they are not telling us, and as I could not really open the XML, they gave me a sample of code. Now I don't know what to change in the very first I posted theere in the beggining, so the XPATHs function...

Collectives™ on Stack Overflow

Deleting XML Lines in VBScript

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related