0

I have code that loads an XML document, performs a $xmlDoc.SelectNodes($XPath) and then foreach($node in $nodes) pokes the XML as a string into a table.

This code works fine on files of ca. 100KB with 10 records.

However, I have a file that is ca. 100MB and ca. 50k records and the code just hangs at $xmlDoc =[xml](gc $xmlpath) (and uses all available system memory). Is there a better way to generate my array $nodes without first parsing the entire XML document?

# Loads xml document
$xmlpath = $filepath
$xmlDoc =[xml](gc $xmlpath)
$nodes = $xmlDoc.SelectNodes('//root') #One element per record in SQL

...

$SqlQuery = @"
INSERT INTO {0} VALUES ({1})
"@

....

foreach($node in $nodes) 
{ 
$StringWriter = New-Object System.IO.StringWriter 
$XmlWriter = New-Object System.XMl.XmlTextWriter $StringWriter 
$XmlWriter.Formatting = "None" 
$XmlWriter.Flush() 
$StringWriter.Flush() 
$node.WriteTo($XmlWriter) 
#data content (for this quote)
$Pxml = "`'"+$StringWriter.ToString()+"`'"

#Write to database
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = [string]::Format($sqlquery, $tableName, $Pxml)
$SqlCmd.Connection = $SqlConnection
$SqlCmd.ExecuteScalar()
} 

The XMl document has structure:

<xml>
  <root>
   ...
  </root>
  <root>
   ...
  </root>
</xml>

and the resultant strings are of form:

<root>
 ...
</root>

2 Answers 2

2

Using this link as a basis, try the code below. The $object should contain your root object

$object= @()
type "$filepath" | %{
  if($_.trim() -eq "<root>") {
    $object= @()
    $object+= $_
  }
  elseif($_.trim() -eq "</root>"){
    $object+= $_
    #call the code within your foreach($node in $nodes) {} section here
  } else {
    $object+= $_
  }
 }
Sign up to request clarification or add additional context in comments.

1 Comment

This produced the desired result almost straight out of the can.
1

As far as I know, XML parsing requires the complete file to be in memory. Try to use a more efficient .Net method of reading the content. The following should run a lot faster and may use less memory, because it saves the content as a string-array instead of an object-array of strings like Get-Content does.

# Loads xml document

# Get aboslute path
$xmlpath = (Resolve-Path $filepath).Path
# Get xml
$xmlDoc = [xml]([IO.File]::ReadAllLines($xmlpath))

An even faster solution would be to drop the casting to xml-document and just parse it as pure text. Still I would avoid Get-Content since it is pretty slow. Something like this could work:

# Get aboslute path
$xmlpath = (Resolve-Path $filepath).Path

# Get streamreader 
$reader = [io.file]::OpenText($xmlpath)
$currentroot = @()

# Read every line
while (($line = $reader.ReadLine()) -ne $null) {
    if ($line.Trim() -eq "<root>") {
        $currentroot.Clear()
        $currentroot += $line
    } else if ($line.Trim() -eq "</root>") {
        $currentroot += $line

        #process root element (by extracting the info from the strings in $currentroot)

        $currentroot.Clear()
    } else {
        $currentroot += $line
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.