Sorry - I've edited this for clarity (and I've tried to remove the bold later in the post, but it's not going away...the asterisks in the source file are throwing it off):
I'm parsing medical claim files and need to find any instances of a match between one string and another but only if the match appears before another string is satisfied.
The specific strings between which I want to search are DTP*431 and REF*6R (I'm including the DTP*431 as part of it, since it will be eliminated under certain circumstances).
I need the regex to return a match if the 8 digits immediately following DTP*431*D8* exactly match the 8 digits immediately following the next instance of DTP*472*RD8* in the file, and to not continue the search after the 8 digits immediately following the next instance of DTP*472*RD8*
This example should not return a match, because the 8 digits immediately after DTP*431*D8* (20150101) do not match the 8 digits immediately following the next instance of DTP*472*RD8* (20150102):
DTP*431*D8*20150101~
[variable text in between]
LX*1~
DTP*472*RD8*20150102-20150102~
REF*6R*[more information]~
[variable text in between]
DTP*431*D8*20141231~
[variable text in between]
LX*1~
DTP*472*RD8*20150103-20150103~
REF*6R*[more information]~
This example should return a match, but should only reflect a match, because the 8 digits immediately following DTP*431*D8* (20150101) exactly match the 8 digits immediately following the first instance of DTP*472*RD8* appearing in the file (20150101):
DTP*431*D8*20150101~
[variable text in between]
LX*1~
DTP*472*RD8*20150101-20150101~
REF*6R*[more information]~
[variable text in between]
DTP*431*D8*20141231~
[variable text in between]
LX*1~
DTP*472*RD8*20150102-20150102~
REF*6R*[more information]~
This example should not return a match, because even though there is a match between the first 8 digits after an instance of DTP*431*D8* and an instance of DTP*472*RD8* (20150101), it is not the next instance of DTP*472*RD8* that provides that match:
DTP*431*D8*20150101~
[variable text in between]
LX*1~
DTP*472*RD8*20150103-20150103~
REF*6R*[more information]~
[variable text in between]
DTP*431*D8*20141231~
[variable text in between]
LX*1~
DTP*472*RD8*20150101-20150101~
REF*6R*[more information]~
Here's what I have so far:
(DTP*431*D8*)(?<=\DTP*431*D8*)([0-9]{8})(.*?)(?=\DTP*472)(DTP*472*RD8*)(\2)
...but if the date in the first instance of DTP*472 doesn't match, it is continuing past the first DTP*472 and looking until it finds any DTP*472 with a date that matches the \1 element.
So if I searched the following text, it would (undesirably) match the bolded portion:
DTP*431*D8*20150101~
[variable text in between]
LX*1~
DTP*472*RD8*20150102-20150102~
REF*6R*[more information]~
[variable text in between]
DTP*431*D8*20141231~
[variable text in between]
LX*1~
DTP*472*RD8*20150101-20150101~
REF*6R*[more information]~
What I'm trying to do is this delete the entire line of DTP*431*D8* if the 8 digits immediately following DTP*431*D8* exactly match the 8 digits immediately following the very next instance of DTP*472*RD8*
Here is a sample I got off the net and modified (I can't use actual data). This item should produce a match:
ISA*00* *01*SECRET ZZSUBMITTERS.ID ZZRECEIVERS.ID *150101*0001*^*00501*00000001*1*T*:~
GSHCSENDER CODE*RECEIVER CODE*0*0001*1*X*005010X222~
ST*837*0021*005010X222~
BHT*0019*00*244579*20150101*1023*CH~
NM1*41*2*BILLING SERVICE*****46*9999999~
PERICDOE*JOHN*3055552222*EX*111~
NM1*40*2*ABC INSURANCE COMPANY*****46*1111111~
HL*1* * 20*1~
PRVBIPXC*1234597890~
NM1*85*2*DOCTOR OFFICE*****XX*9876543210~
N3*1234 MAIN ST~
N4*LOS ANGELES*CA*11111~
REF*EI*222222222~
NM1*87*2~
N3*2345 MAIN ST~
N4*LOS ANGELES*CA*11111~
HL*2*1*22*1~
SBR*P********CI~
NM1*IL*1*DOE*JANE****MI*11332255~
DMG*D8*10000101*O~
NM1*PR*2*DEF INSURANCE COMPANY*****PI*999996666~
HL*3*2*23*0~
PAT*19~
NM1*QC*1*JONES*JOHN~
N3*111 N MAIN ST~
N4*LOS ANGELES*CA*22222~
DMG*D8*10000202*O~
CLM*888888*1***11:B:1*YAY*I~
DTP*431*D8*20150201~
REF*D9*1~
HI*BK:9999*BF:V999~
LX*1~
SV1*HC:99999*1*UN*1***1~
DTP*472*RD8*20150101-20150201~
REF*6R*000001~
SE*33*0021~
GE*1*1~
IEA*1*000000001~
Using this, if my match is found, I'd delete the DTP*431*D8 line entirely and the file would then appear as below:
ISA*00* *01*SECRET ZZSUBMITTERS.ID ZZRECEIVERS.ID *150101*0001*^*00501*00000001*1*T*:~
GSHCSENDER CODE*RECEIVER CODE*0*0001*1*X*005010X222~
ST*837*0021*005010X222~
BHT*0019*00*244579*20150101*1023*CH~
NM1*41*2*BILLING SERVICE*****46*9999999~
PERICDOE*JOHN*3055552222*EX*111~
NM1*40*2*ABC INSURANCE COMPANY*****46*1111111~
HL*120*1~
PRVBIPXC*1234597890~
NM1*85*2*DOCTOR OFFICE*****XX*9876543210~
N3*1234 MAIN ST~
N4*LOS ANGELES*CA*11111~
REF*EI*222222222~
NM1*87*2~
N3*2345 MAIN ST~
N4*LOS ANGELES*CA*11111~
HL*2*1*22*1~
SBR*P********CI~
NM1*IL*1*DOE*JANE****MI*11332255~
DMG*D8*10000101*O~
NM1*PR*2*DEF INSURANCE COMPANY*****PI*999996666~
HL*3*2*23*0~
PAT*19~
NM1*QC*1*JONES*JOHN~
N3*111 N MAIN ST~
N4*LOS ANGELES*CA*22222~
DMG*D8*10000202*O~
CLM*888888*1***11:B:1*YAYI~
REFD9*1~
HI*BK:9999*BF:V999~
LX*1~
SV1*HC:99999*1*UN*1***1~
DTP*472*RD8*20150101-20150201~
REF*6R*000001~
SE*33*0021~
GE*1*1~
IEA*1*000000001~
Conversely, this item would not get a match and the file would be left as is:
ISA*00* *01*SECRET ZZSUBMITTERS.ID ZZRECEIVERS.ID *150101*0001*^*00501*00000001*1*T*:~
GSHCSENDER CODE*RECEIVER CODE*0*0001*1*X*005010X222~
ST*837*0021*005010X222~
BHT*0019*00*244579*20150101*1023*CH~
NM1*41*2*BILLING SERVICE*****46*9999999~
PERICDOE*JOHN*3055552222*EX*111~
NM1*40*2*ABC INSURANCE COMPANY*****46*1111111~
HL*120*1~
PRVBIPXC*1234597890~
NM1*85*2*DOCTOR OFFICE*****XX*9876543210~
N3*1234 MAIN ST~
N4*LOS ANGELES*CA*11111~
REF*EI*222222222~
NM1*87*2~
N3*2345 MAIN ST~
N4*LOS ANGELES*CA*11111~
HL*2*1*22*1~
SBR*P********CI~
NM1*IL*1*DOE*JANE****MI*11332255~
DMG*D8*10000101*O~
NM1*PR*2*DEF INSURANCE COMPANY*****PI*999996666~
HL*3*2*23*0~
PAT*19~
NM1*QC*1*JONES*JOHN~
N3*111 N MAIN ST~
N4*LOS ANGELES*CA*22222~
DMG*D8*10000202*O~
CLM*888888*1***11:B:1*YAY*I~
DTP*431*D8*20150201~
REF*D9*1~
HI*BK:9999*BF:V999~
LX*1~
SV1*HC:99999*1*UN*1***1~
DTP*472*RD8*20150102-20150202~
REF*6R*000001~
SE*33*0021~
GE*1*1~
IEA*1*000000001~

REF*6R*[? Are you looking for balanced start/end ?