0

I have a big text file (4000) lines that I need to parse and match a specific string. After I hit that string I need it to continue down until it matches again and all of the text in between I need to take out and save into its own file. How can I match multiple lines instead of just one individual line?

I have tried to use Select-String but I cannot get that to work in my specific instance and I am stuck.

Example text file:

SOF
I need this from here
sample text 
sample text 
sample text 
sample text 
sample text 
sample text 
To here
I need this from here
sample text 
sample text 
sample text 
sample text 
sample text 
To here
.
.
.
.
.
.
EOF
3
  • 1
    How big is "massive"? Commented Mar 12, 2021 at 18:26
  • How massive? Can it all be read into memory at once? Do we need to try and read it one line at a time with a stream reader or something? If it's just several megs you could read it all into memory as a large multi-line string and split it on new line characters with a regex lookahead for the 'I need tis from here' string. If it's gigs in size you need a stream reader and more logic. Commented Mar 12, 2021 at 18:27
  • 4000 lines, I guess in the realm of things that's not terribly big. 1KB Commented Mar 12, 2021 at 18:32

2 Answers 2

2

A 1KB file is pretty small, and easy to read into memory all at once. You could totally read it in as one multi-line string, and split it to output chunks.

$RawText = Get-Content C:\Path\To\File.txt -Raw
$Records = $RawText -split '[\r\n]+(?=I need this from here)'
For($i=0;$i -lt $Records.count;$i++){
    $Records[$i] | Set-Content C:\Path\To\FileSplit-$i.txt
}

That would give you (with the sample text you provided) 3 files:

FileSplit-0.txt

SOF

FileSplit-1.txt

I need this from here
sample text 
sample text 
sample text 
sample text 
sample text 
sample text 
To here

FileSplit-2.txt

I need this from here
sample text 
sample text 
sample text 
sample text 
sample text 
To here
.
.
.
.
.
.
EOF
Sign up to request clarification or add additional context in comments.

4 Comments

I am getting an error 'Set-Content : The input object cannot be bound to any parameters for the command either because the command does not take pipeline input or the input and its properties do not match any of the parameters that take pipeline input'
But I do believe it is creating the specific files, it is just not placing content inside of them and placing them in my specified directory
@GarrettStarkey Did I misunderstand? I thought you said you wanted the text between the two lines, not all the text just split at a couple of points?
Sorry for the slow response, did you figure out the issue? I'm not able to replicate the error given the sample text given and the code I provided.
1

Since it's small enough to read into memory all at once, another viable solution would be to use the regex pattern with the regex class' static matches() method.

I've updated your sample text to clearly show the appropriate lines are extracted.

$file = New-TemporaryFile

@'
SOF
I need this from here
1 sample text 
2 sample text 
3 sample text 
4 sample text 
5 sample text 
6 sample text 
To here
I need this from here
7 sample text 
8 sample text 
9 sample text 
10 sample text 
11 sample text 
To here
.
.
.
.
.
.
EOF
'@ | Set-Content $file -Encoding UTF8

$text = Get-Content $file -raw

[regex]$regex = '(?s)(?<=I need this from here).+?(?=\r?\nTo here)'

$regex.Matches($text) | ForEach-Object {$_.value}

Output

enter image description here

Regex details

  • (?s) - treat the entire text as a single string. . matches all characters including new lines. May not be needed with the -Raw parameter of Get-Content but needed in other situations.
  • (?<=) - Positive look behind.
  • (?=) - Positive look ahead.
  • .+? - Match any character, as few as possible.
  • \r?\n = Match new line/carriage return (to avoid adding extra line return to the end of the matched text)

You can take output of $_.value into two different files as desired.

Perhaps something like this

$text = Get-Content $file -raw

[regex]$regex = '(?s)(?<=I need this from here).+?(?=\r?\nTo here)'

$newfiles = $regex.Matches($text) | ForEach-Object {
    $tempfile = New-TemporaryFile
    Set-Content -Path $tempfile -Value $_.value
    Write-Host Output file: $tempfile.FullName
}

Or this

$text = Get-Content $file -raw

[regex]$regex = '(?s)(?<=I need this from here).+?(?=\r?\nTo here)'

$matchedtext = $regex.Matches($text)

for($i = 1; $i -le $matchedtext.count; $i++){
    $outfile = Join-Path c:\temp SplitText$i.txt
    Set-Content -Path $outfile -Value $matchedtext[$i].value
    Write-Host Output file: $outfile
}

5 Comments

It runs and creates the files but it does not put any content in the files.
If it makes any difference, 'From here' 'To here' is going to be the same word. Everything in between them I need. Basically its a sales order and im just splitting them up into different files instead of one big one
The code i put in here will absolutely create the two files with contents. Now if you have different input files with different words to match, it will be up to you to craft the pattern. I only have what you provide to work with.
I reviewed your answer some more and I see where I went wrong! I appreciate all of your help
Happy to help! Have a great day.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.