0

I'm trying to create a PowerShell script that transpose a CSV file from column to rows.

I found examples of doing the opposite (converting row based CSV to column) but I found nothing on column to rows. My problem being that I don't know exactly how many column I'll have. I tried adapting the row to column to column to rows but unsuccessfully.

$a = Import-Csv "input.csv"
$a | FT -AutoSize

$b = @()
foreach ($Property in $a.Property | Select -Unique) {
    $Props = [ordered]@{ Property = $Property }
    foreach ($Server in $a.Server | Select -Unique){ 
        $Value = ($a.where({ $_.Server -eq $Server -and 
                    $_.Property -eq $Property })).Value
        $Props += @{ $Server = $Value }
    }
    $b += New-Object -TypeName PSObject -Property $Props
}

$b | FT -AutoSize
$b | Out-GridView
$b | Export-Csv "output.csv" -NoTypeInformation 

For example my CSV can look like this:

"ID","DATA1"
"12345","11111"
"54321","11111"
"23456","44444"

or this (number of column can vary):

"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555",

and I would like the script to convert it like this:

"ID","DATA"
"12345","11111"
"12345","22222"
"12345","33333"
"54321","11111"
"23456","44444"
"23456","55555"
4
  • Hmmm...so what you are asking for is not exactly what you say you want. Converting the columns to rows would not give the output that you show. What you want is to convert one row per ID to multiple rows per ID. There are a few ways to go about this. Will the ID always be the first coulmn? Commented Jun 12, 2018 at 15:11
  • Yes, it will always be the first column Commented Jun 12, 2018 at 15:16
  • Also, Is there any chance at all that the data will ever have a comma in a field? Commented Jun 12, 2018 at 15:21
  • No, there won't be any comma in a field. Commented Jun 12, 2018 at 15:23

4 Answers 4

1

The trick is to query the members of the table to get the column names. Once you do that then the rest is straightforward:

function Flip-Table ($Table) {

    Process {

        $Row = $_

        # Get all the columns names, excluding the ID field.
        $Columns = ($Row | Get-Member -Type NoteProperty | Where-Object Name -ne ID).Name

        foreach ($Column in $Columns) {

            if ($Row.$Column) {

                $Properties = [Ordered] @{
                    "ID"   = $Row.ID
                    "DATA" = $Row.$Column
                }

                New-Object PSObject -Property $Properties
            }
        }

        # Garbage collection won't kick in until the end of the script, so
        # invoke it every 100 input rows.

        $Count++;

        if (($Count % 100) -eq 0) {
            [System.GC]::GetTotalMemory('forceFullCollection') | out-null
        }
    }
}

Import-Csv input.csv | Flip-Table | Export-Csv -NoTypeInformation output.csv
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks again, this works perfectly. But I do have one of the files I process with this that is very big, resulting in a huge RAM usage by the script (actually had to kill some other process to avoid the script crashing). Is it possible to adapt this to write some of the lines in the output file then free some memory before resuming. I guess this would slow down the process but avoid needing high amount of memory when working with huge files and risking a crash.
@FrancoisA I've changed the function so that it streams the data. That should, as far as I know, fix the RAM issue. Note that it's called in a different way in the last line now.
Unfortunately even with that new version the high memory usage issue persisted Exception of type 'System.OutOfMemoryException' was thrown. At C:\Script\transposeRowsGroups.ps1:19 char:17 + if ($Row.$Column) { + + CategoryInfo : OperationStopped: (:) [], OutOfMemoryException + FullyQualifiedErrorID : System.OutOfMemoryException
OK - I've just read up on garbage collection in PowerShell and added a manual garbage collection step. I'm hoping that should sort out the memory allocation problem.
Thanks a lot. Ok, it's doesn't work. But it is because there is an issue with the garbage collector when using a pipe call, I had to change [System.GC]::Collect() to [System.GC]::GetTotalMemory(‘forcefullcollection’) | out-null so now it works. Thanks for your help ! GC Issue Work-Around
|
1

Well, here is mine. I'm not as fancy as the rest:

$in = Get-Content input.csv | Select -Skip 1
$out = New-Object System.Collections.ArrayList
foreach($row in $in){
    $parts = $row.Split(',')
    $id = $parts[0]
    foreach($data in $parts[1..$parts.Count]){
        if($data -ne '' -AND $data -ne $null){
            $temp = New-Object PSCustomObject -Property @{'ID' = $id;
                                                        'Data' = $data}
            $out.Add($temp) | Out-Null
        }
    }

}
$out | Export-CSV output.csv -NoTypeInformation

Comments

0

You can do something like this

# Convert csv to object
$csv = ConvertFrom-Csv @"
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555"
"@

# Ignore common members and the ID property
$excludedMembers = @(
    'GetHashCode',
    'GetType',
    'ToString',
    'Equals',
    'ID'
)

$results = @()

# Iterate around each csv row
foreach ($row in $csv) {
    $members =  $row | Get-Member
    # Iterate around each member from the 'row object' apart from our 
    # exclusions and empty values
    foreach ($member in $members | 
        Where { $excludedMembers -notcontains $_.Name -and $row.($_.Name)}) {
        # add to array of objects
        $results += @{ ID=$row.ID; DATA=$row.($member.Name)}
    }
}

# Write the csv string
$outstring = "ID,DATA"
$results | foreach { $outstring += "`n$($_.ID),$($_.DATA)" }

# New csv object
$csv = $outstring | ConvertFrom-Csv

Probably not the most elegant solution, but should do what you need

I left some comments explaining what it does

Comments

0

If you only want to accept a limited number DATA columns (e.g. 5), you could do:

ForEach ($i in 1..5) {$CSV | ? {$_."Data$i"} | Select ID, @{N='Data'; E={$_."Data$i"}}}

And if you have a potential unlimited number of DATA columns:

ForEach ($Data in ($CSV | Select "Data*" -First 1).PSObject.Properties.Name) {
    $CSV | ? {$_.$Data} | Select ID, @{N='Data'; E={$_.$Data}}
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.