Reading large XML document to generate array of XML nodes in Powershell?

Question

I have code that loads an XML document, performs a $xmlDoc.SelectNodes($XPath) and then foreach($node in $nodes) pokes the XML as a string into a table.

This code works fine on files of ca. 100KB with 10 records.

However, I have a file that is ca. 100MB and ca. 50k records and the code just hangs at $xmlDoc =[xml](gc $xmlpath) (and uses all available system memory). Is there a better way to generate my array $nodes without first parsing the entire XML document?

# Loads xml document
$xmlpath = $filepath
$xmlDoc =[xml](gc $xmlpath)
$nodes = $xmlDoc.SelectNodes('//root') #One element per record in SQL

...

$SqlQuery = @"
INSERT INTO {0} VALUES ({1})
"@

....

foreach($node in $nodes) 
{ 
$StringWriter = New-Object System.IO.StringWriter 
$XmlWriter = New-Object System.XMl.XmlTextWriter $StringWriter 
$XmlWriter.Formatting = "None" 
$XmlWriter.Flush() 
$StringWriter.Flush() 
$node.WriteTo($XmlWriter) 
#data content (for this quote)
$Pxml = "`'"+$StringWriter.ToString()+"`'"

#Write to database
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = [string]::Format($sqlquery, $tableName, $Pxml)
$SqlCmd.Connection = $SqlConnection
$SqlCmd.ExecuteScalar()
}

The XMl document has structure:

<xml>
  <root>
   ...
  </root>
  <root>
   ...
  </root>
</xml>

and the resultant strings are of form:

<root>
 ...
</root>

TechSpud · Accepted Answer · 2013-03-08 11:14:19Z

2

Using this link as a basis, try the code below. The $object should contain your root object

$object= @()
type "$filepath" | %{
  if($_.trim() -eq "<root>") {
    $object= @()
    $object+= $_
  }
  elseif($_.trim() -eq "</root>"){
    $object+= $_
    #call the code within your foreach($node in $nodes) {} section here
  } else {
    $object+= $_
  }
 }

answered Mar 8, 2013 at 11:14

TechSpud

3,5581 gold badge30 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JustinJDavies Over a year ago

This produced the desired result almost straight out of the can.

Frode F. · Accepted Answer · 2013-03-08 13:04:29Z

As far as I know, XML parsing requires the complete file to be in memory. Try to use a more efficient .Net method of reading the content. The following should run a lot faster and may use less memory, because it saves the content as a string-array instead of an object-array of strings like Get-Content does.

# Loads xml document

# Get aboslute path
$xmlpath = (Resolve-Path $filepath).Path
# Get xml
$xmlDoc = [xml]([IO.File]::ReadAllLines($xmlpath))

An even faster solution would be to drop the casting to xml-document and just parse it as pure text. Still I would avoid Get-Content since it is pretty slow. Something like this could work:

# Get aboslute path
$xmlpath = (Resolve-Path $filepath).Path

# Get streamreader 
$reader = [io.file]::OpenText($xmlpath)
$currentroot = @()

# Read every line
while (($line = $reader.ReadLine()) -ne $null) {
    if ($line.Trim() -eq "<root>") {
        $currentroot.Clear()
        $currentroot += $line
    } else if ($line.Trim() -eq "</root>") {
        $currentroot += $line

        #process root element (by extracting the info from the strings in $currentroot)

        $currentroot.Clear()
    } else {
        $currentroot += $line
    }
}

Collectives™ on Stack Overflow

Reading large XML document to generate array of XML nodes in Powershell?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related