3

I have a file that looks like this:

SPECIMEN: Procedure: xxxx1 A) Location: yyyy2
Major zzz B) Location: something
text here C) more


CLINICAL DIAGNOSIS: xyz

Where the newlines are CR then LF.

I'm trying to make regex that reads from the end of Procedure: until the start of CLINICAL DIAGNOSIS but having issues reading multiple lines.

Here's what I have:

$input_file = 'c:\Path\0240188.txt'
$regex = ‘(?m)^SPECIMEN: Procedure: (.*)CLINICAL DIAGNOSIS:’
select-string -Path $input_file -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value }

Which doesn't return anything.

If I change the line to:

$regex = ‘(?m)^SPECIMEN: Procedure: (.*)’

It grabs the first line, but not the rest. I assumed (?m) was suppose to grab multiple lines for me.

Any tips?

1
  • Any way to slurp that whole file into a variable ? Commented Sep 11, 2014 at 22:57

5 Answers 5

1

Try this:

$regex = '(?ms).*SPECIMEN: Procedure:(.+)CLINICAL DIAGNOSIS: '

Get-Content $input_file -Delimiter 'CLINICAL DIAGNOSIS: '|
 foreach {@($_) -match 'CLINICAL DIAGNOSIS: ' -replace $regex,'$1'}

Using 'Clinical Diagnosis' as a delimiter eliminates the need to read in all the data at once and resolve/capture multiple matches at once.

Sign up to request clarification or add additional context in comments.

Comments

1

It seems that the $input_file only reads line by line, which doesn't help you here,

Try:

$fileContent = [io.file]::ReadAllText("C:\file.txt")

Or

$fileContent = Get-Content c:\file.txt -Raw

Taken from another post here.

Comments

1

(?m) causes ^ and $ anchors to match the beginning and end of each line when implemented. You want to use the inline (?s) modifier which forces the dot to match all characters, including line breaks.

$regex = ‘(?s)SPECIMEN: Procedure: (.*)CLINICAL DIAGNOSIS:’

1 Comment

This didn't give me any results. If I took out the CLINICAL DIAGNOSIS: in your line, it only ended up returning SPECIMEN: Procedure: ; the ? seems to be part of the issue causing this?
0

Try with this:

$input_file = gc 'c:\Path\0240188.txt' | out-string
# or: gc c:\path\xxxxx.txt -raw  #with v3+
$regex = ‘(?s)\bSPECIMEN: Procedure: (.*?)CLINICAL DIAGNOSIS:’
$input_file | select-string -Pattern $regex -AllMatches | % { $_.Matches }
# or: [regex]::matches($input_file, $regex) # much faster

Comments

0

You could use a little regex trick like this:

Procedure:([\S\s]+)CLINICAL DIAGNOSIS

Working demo

enter image description here

Since the . matches everything except new lines you could use [\S\s]+ to match everything as the image shows in green and also captures it using capturing group (...). This trick works if you want to avoid using single line flag.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.