1

I need to extract 2 columns (column 147 and column 148) from a huge pipe delimited file (100 MB-3GB). I cannot open in excel or if I do it takes forever. Once the columns are extracted, I need to output the top N rows into another text file but I need to format them so they can be put into SQL so I can query a table. However, there can be many concatenated fields, separated by commas, in the 2 extracted columns (LOGICAL_KEY_CONCAT_FIELD_CLM, LOGICAL_KEY_CONCAT_FIELD_CLM2) and their format can vary from file to file. I need this to be a repeating process.

For example, pipe delimited file looks like:

|||…|LOGICAL_KEY_CONCAT_FIELD_CLM|LOGICAL_KEY_CONCAT_FIELD_CLM2|||||…
|||…|CLM,KEY,1|CLM,FINANCIAL,KEY,1,2018-11-30|||…
|||…|CLM,KEY,2|CLM,FINANCIAL,KEY,2,2018-11-30|||…
|||…|CLM,KEY,3|CLM,FINANCIAL,KEY,3,2018-11-30|||…
.
.
.

OUTPUT:

(LOGICAL_KEY_CONCAT_FIELD_CLM = 'CLM,KEY,1' AND 
LOGICAL_KEY_CONCAT_FIELD_CLM2 = 'CLM,FINANCIAL,KEY,1,2018-11-30')
OR (LOGICAL_KEY_CONCAT_FIELD_CLM = 'CLM,KEY,2' AND 
LOGICAL_KEY_CONCAT_FIELD_CLM2 = 'CLM,FINANCIAL,KEY,2,2018-11-30')
OR (LOGICAL_KEY_CONCAT_FIELD_CLM = 'CLM,KEY,3' AND 
LOGICAL_KEY_CONCAT_FIELD_CLM2 = 'CLM,FINANCIAL,KEY,3,2018-11-30')

Basically, I want to run a query using the data from the columns in the file and I want to ensure the combination of LOGICAL_KEY_CONCAT_FIELD_CLM and LOGICAL_KEY_CONCAT_FIELD_CLM2 are in the table. I don’t need SELECT * FROM Table WHERE because the Table varies depending on the incoming source.

I am using the following code but it doesn’t print the columns in the output file, just the headers:

Get-Content "\\LocationOfFile\CLAIM_20190103T17053920.txt" | select-object 
LOGICAL_KEY_CONCAT_FIELD_CLM,LOGICAL_KEY_CONCAT_FIELD_CLM2 -First 10 | Out- 
File "P:\PDS_QA\TestFile\Output.txt"
4
  • It's unclear what you are asking, with select-object you select properties/columns, but with get-content you get pure text no object-properties. From your sample data it's unclear if all the headers are unique. In general use Import-Csv yourfile.csv -Delimiter '|' to read the file in. (or to reduce to the top n+1 lines first read in with Get-Content yourfile.csv|select-object -first 11|convert-From-csv -Delmiter '|') Commented Jan 24, 2019 at 17:11
  • Do you mean something like this import-csv .\claim.txt -Delimiter "|" | Select-Object LOGICAL_KEY_CONCAT_FIELD_CLM,LOGICAL_KEY_CONCAT_FIELD_CLM2 -First 2 | Export-Csv .\test.txt -NoTypeInformation? What do you want to do with the Output Commented Jan 24, 2019 at 17:13
  • 1
    Thanks @Razorfen, this is part of what I needed. I plan to take the output into SQL where I can enter the SELECT|FROM|WHERE clause and then just paste the output of the PS script to the WHERE clause. Do you know how I can add OR (LOGICAL_KEY_CONCAT_FIELD_CLM = 'OutputFromFirstColumn' AND LOGICAL_KEY_CONCAT_FIELD_CLM2 = 'OutputFromSecondColumn') in the output file? Commented Jan 24, 2019 at 19:20
  • 1
    @jujulalu Razorfen is correct, but the performance could be a somewhat bad for such large files. You should try that out. Look at my updated answer and also my new answer which explains the pure PS solution. Commented Jan 25, 2019 at 9:48

2 Answers 2

2

This is probably the fastest option(?)

Since your file is very large, it uses some .NET streams for optimal performance:

$outstream = New-Object System.IO.StreamWriter "P:\PDS_QA\TestFile\Output.txt"
try {
    $outstream.WriteLine("SELECT * FROM Table WHERE 1=0")
    $firstLine = $true
    foreach ($line in [System.IO.File]::ReadLines("\\LocationOfFile\CLAIM_20190103T17053920.txt")) {
        if ($firstLine) {
            # skip the header of the file
            $firstLine = $false
            continue
        }
        $values = $line.Split("|")
        # (-1 because I assume your column numbers are one-based)
        $clm, $clm2 = $values[146, 147]
        $line = "OR (LOGICAL_KEY_CONCAT_FIELD_CLM = '{0}' AND LOGICAL_KEY_CONCAT_FIELD_CLM2 = '{1}')" -f $clm, $clm2
        $outstream.WriteLine($line)
    }
}
finally {
    $outstream.Dispose()
}
Sign up to request clarification or add additional context in comments.

2 Comments

I'd shorten the split to $CLM,$CLM2 = $line.Split('|')[146..147] (I hate ultra long variable names) (+1)
@LotPings Agreed. Acutally I had used the short names before, but replaced them with the longer names when I posted the answer.
1

Alternative solution, with pure built-in PowerShell cmdlets. I can't say how good the performance is for large files. Probably worse than my other answer. You should try it out.

Also, this requires the input CSV to have a header with unique names for every column.

$infile = "\\LocationOfFile\CLAIM_20190103T17053920.txt"
$outfile = "P:\PDS_QA\TestFile\Output.txt"
"SELECT * FROM Table WHERE 1=0" | Out-File $outfile
Import-Csv $infile -Delimiter "|" | foreach {
    "OR (LOGICAL_KEY_CONCAT_FIELD_CLM = '{0}' AND LOGICAL_KEY_CONCAT_FIELD_CLM2 = '{1}')" -f (
    $_.LOGICAL_KEY_CONCAT_FIELD_CLM,
    $_.LOGICAL_KEY_CONCAT_FIELD_CLM2)
} | Out-File $outfile -Append

1 Comment

Thanks @marsze, I had to add to add a line of code before the for each loop but it is working as I would like. I added code in bold: Import-Csv $infile -Delimiter "|" | SELECT LOGICAL_KEY_CONCAT_FIELD_RX, LOGICAL_KEY_CONCAT_FIELD_RXFIN -First 1000 | foreach

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.