1

enter image description heregetting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.

$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
    ForEach-Object { Import-Csv $_.FullName } |
    Where-Object { $_.machine_type -eq 'workstations' } |
    Export-Csv $outputFile -NoType

3
  • CSVs are just text. The ImportCSV cmdlet is useful for manipulating CSV data in PowerShell, but if all you want to do is to append one CSV onto another (assuming they're the same width) you can just read them as if they were text. get-content *.csv | set-content combined.csv Seems like it should work. Might have to mess with line endings? Commented Nov 1, 2019 at 14:34
  • 1
    @Joe: I need to filter the combined.csv with machine_type -eq "workstation" only Commented Nov 1, 2019 at 14:57
  • your CSVs have Always same columns in the same order? Commented Nov 7, 2019 at 3:34

5 Answers 5

1

May be can you export and filter your files one by one and append result into your output file like this :

$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"

Remove-Item $outputFile -Force -ErrorAction SilentlyContinue

Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Sign up to request clarification or add additional context in comments.

4 Comments

thanks for this. Even Though it still throws memory error it manages to create the csv file and append the data. I have added -ErrorAction SilentlyContinue to the export-csv line as well so it can continue with processing the data.
@Enigma: An out-of-memory error is a statement-terminating error, so it's likely that data will be missing from your output CSV file.
mklement0 has right, if you have still throws error you cant use this solution
@Enigma: After the Export-Csv command, you can place a command that forces garbage collection to see if that gets rid of the error: [gc]::Collect(); [gc]::WaitForPendingFinalizers(); also, I've found issues on GitHub relating to the issue - see my updated answer (speaking of: I still don't understand why the switch-based workaround doesn't work for you).
1

Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell (Core) 7 - see the bottom section for a more concise workaround.

However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.

You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.

Update:

  • GitHub issue #7603 contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.

  • GitHub feature request #8862 proposes using strongly typed output objects to help the issue.

The following workaround, which uses the switch statement to process the files as text files, may help:

$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
  $i = 0
  switch -Wildcard -File $_.FullName {
    '*workstations*' {
      # NOTE: If no other columns contain the word `workstations`, you can 
      # simplify and speed up the command by omitting the `ConvertFrom-Csv` call 
      # (you can make the wildcard matching more robust with something 
      # like '*,workstations,*')
      if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
      $_ # row whose 'machine_type' column value equals 'workstations'
    }
    default {
      if ($i++ -eq 0) {
        if ($header) { continue } # header already written
        else { $header = $_; $_ } # header row of 1st file
      }
    }
  }
} | Set-Content $outputFile

Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:

Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
    Where-Object { $_.machine_type -eq 'workstations' } |
    Export-Csv $outputFile -NoType

Note that in PowerShell 7 you could more naturally write:

Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
  Where-Object { $_.machine_type -eq 'workstations' } |
    Export-Csv $outputFile -NoType

6 Comments

much appreciate your help with this.But im still getting memory exceptopn.PS C:\change\2019\October> $inputFolder = "C:\Change\2019\October" PS C:\change\2019\October> $outputFile = "C:\Change\2019\output.csv" PS C:\change\2019\October> Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) | >> Where-Object { $_.machine_type -eq 'workstation' } | >> Export-Csv $outputFile -NoType Exception of type 'System.OutOfMemoryException' was thrown. At line:1 char:1, OutOfMemoryException + FullyQualifiedErrorId : System.OutOfMemoryException
PS C:\WINDOWS\system32> $PSVersionTable Name Value ---- ----- PSVersion 5.1.17763.771 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.17763.771 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1
@Enigma: Please see my update that shows another workaround. If that doesn't help, we could try forcing periodic garbage collection to force previously allocated objects to be released.
@Enigma: If no other columns contain the word workstations, you can simplify and speed up the command by omitting the ConvertFrom-Csv call (you can make the wildcard matching more robust with something like '*,workstations,*')
The workaround works but the output csv seems not truncating after each line with data populating after the table header data column horizontally.
|
0

Solution 2 :

$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8  # modify encoding if necessary
$Delimiter=','

#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1

#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}

#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''

#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)

#write header founded
$w.WriteLine($Header)


#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{

    #open file for read
    $r = New-Object System.IO.StreamReader($_.fullname, $encoding)
    $skiprow = $true

    while ($line = $r.ReadLine()) 
    {
        #exclude header
        if ($skiprow) 
        {
            $skiprow = $false
            continue
        }

        #Get objet for current row with header founded
        $Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter

        #write in output file for your condition asked
        if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }

    }

    $r.Close()
    $r.Dispose()

}

$w.close()
$w.Dispose()

1 Comment

have you try my second proposition?
-1

You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:

$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8

$files = Get-ChildItem -Path $filePath -Filter *.csv | 
         Where-Object { $_.machine_type -eq 'workstations' }

$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)

$skiprow = $false
foreach ($file in $files)
{
    $r = New-Object System.IO.StreamReader($file.fullname, $encoding)
    while (($line = $r.ReadLine()) -ne $null) 
    {
        if (!$skiprow)
        {
            $w.WriteLine($line)
        }
        $skiprow = $false
    }
    $r.Close()
    $r.Dispose()
    $skiprow = $true
}

$w.close()
$w.Dispose()

1 Comment

are we missing Import-csv in the line...'$files = Get-ChildItem -Path $filePath -Filter *.csv | Where-Object { $_.machine_type -eq 'workstations' }'
-2

get-content *.csv | add-content combined.csv

Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

3 Comments

This will duplicate header rows in the output file, and it also doesn't address the filter requirement. (Aside from that, Set-Content should be used).
Won't Set-Content overwrite the already-set contents of combined.csv, leaving it a duplicate of the last CSV file that was picked up?
No, whatever Set-Content receives via the pipeline all goes into the target file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.