1

I'm trying to extract the PROCEDURE section out of CLAIM, EOB & COB from a text file.

and create an object like so

claim : [{PROCEDURE1}, {PROCEDURE2}, {PROCEDURE3}],
eob : [{PROCEDURE1}, {PROCEDURE2}, {PROCEDURE3}],
cob: [{PROCEDURE1}, {PROCEDURE2}, {PROCEDURE3}]

let data = `    SEND CLAIM {
       PREFIX="9403        "
       PROCEDURE { /* #1  */
          PROCEDURE_LINE="1"
          PROCEDURE_CODE="01201"
        
       }
       PROCEDURE { /* #2  */
          PROCEDURE_LINE="2"
          PROCEDURE_CODE="02102"
         
       }
       PROCEDURE { /* #3  */
          PROCEDURE_LINE="3"
          PROCEDURE_CODE="21222"
       
       }
    }
    
    SEND EOB {
          PREFIX="9403        "
          OFFICE_SEQUENCE="000721"
          PROCEDURE { /* #1 */
             PROCEDURE_LINE="1"
             ELIGIBLE="002750"
          }
          PROCEDURE { /* #2 */
             PROCEDURE_LINE="2"
             ELIGIBLE="008725"
          }
          PROCEDURE { /* #3 */
             PROCEDURE_LINE="3"
             ELIGIBLE="010200"
          }
    }
    
    SEND COB {
       PREFIX="TEST4       "
       OFFICE_SEQUENCE="000721"
       PROCEDURE { /* #1  */
          PROCEDURE_LINE="1"
          PROCEDURE_CODE="01201"
        
       }
       PROCEDURE { /* #2  */
          PROCEDURE_LINE="2"
          PROCEDURE_CODE="02102"
       }
       PROCEDURE { /* #3  */
          PROCEDURE_LINE="3"
          PROCEDURE_CODE="21222"
          DATE="19990104"
       }
       PRIME_EOB=SEND EOB {
          PREFIX="9403        "
          OFFICE_SEQUENCE="000721"
          PROCEDURE { /* #1 */
             PROCEDURE_LINE="1"
             ELIGIBLE="002750"
          }
          PROCEDURE { /* #2 */
             PROCEDURE_LINE="2"
             ELIGIBLE="008725"
          }
          PROCEDURE { /* #3 */
             PROCEDURE_LINE="3"
             ELIGIBLE="010200"
          }
    
       }
    }`
    
    let re = /(^\s+PROCEDURE\s\{)([\S\s]*?)(?:})/gm
    
    console.log(data.match(re));

Here is what I have tried so far (^\s+PROCEDURE\s\{)([\S\s]*?)(?:}), but I can't figure out how I can match PROCEDUREs after key CLAIM or EOB

4
  • 1
    Is the section PRIME_EOB=SEND EOB to be skipped? Commented Jul 11, 2020 at 5:18
  • @CarySwoveland, it should not be skipped, should be included in the results object, my current strategy is to make this a two-step process, first match CLAIM section then parse PROCEDURE and add it to results object, and so on for other keys. if you have a cleaner / better idea that would be great, thanks Commented Jul 11, 2020 at 6:14
  • The reason for my confusion is that "SEND EOB" appears twice, at two different "levels". Commented Jul 11, 2020 at 6:16
  • You'll see that my answer disregards the PRIME_EOB=SEND EOB, as I am not clear as to how it should be dealt with. Commented Jul 11, 2020 at 6:55

3 Answers 3

1

For "claim", you could match the following regular expression.

/(?<=^ *SEND CLAIM +\{\r?\n(?:^(?! *SEND EOB *\{)(?! *SEND COB *\{).*\r?\n)*^ *PROCEDURE *)\{[^\}]*\}/

CLAIM regex

This matches the following strings, which I assume can be easily saved to an array with a sprinkling of Javascript code.

         { /* CLAIM #1  */  
   PROCEDURE_LINE="1"
   PROCEDURE_CODE="01201"
    
}

          { /* CLAIM #2  */
   PROCEDURE_LINE="2"
   PROCEDURE_CODE="02102"
  
}

          { /* CLAIM #3  */
   PROCEDURE_LINE="3"
   PROCEDURE_CODE="21222"
   
}

Javascript's regex engine performs the following operations.

(?<=                 : begin positive lookbehind
  ^                  : match beginning of line
  \ *SEND CLAIM\ +   : match 'SEND CLAIM' surrounded by 0+ spaces
  \{\r?\n            : match '{' then line terminators
  (?:                : begin non-capture group
    ^                : match beginning of line
    (?!              : begin negative lookahead
      \ *SEND EOB\ * : match 'SEND EOB' surrounded by 0+ spaces
      \{             : match '{'
    )                : end negative lookahead
    (?!              : begin negative lookahead
      \ *SEND COB\ * : match 'SEND COB' surrounded by 0+ spaces
      \{             : match '{'
    )                : end negative lookahead
    .*\r?\n          : match line including terminators
  )                  : end non-capture group
  *                  : execute non-capture group 0+ times
  ^                  : match beginning of line
  \ *PROCEDURE\ *    : match 'PROCEDURE' surrounded by 0+ spaces 
)                    : end positive lookbehind
\{[^\}]*\}           : match '{', 0+ characters other than '}', '}' 

I've escaped space characters above to improve readability.

For "eob", use the slightly-modified regex:

/(?<=^ *SEND EOB +\{\r?\n(?:^(?! *SEND CLAIM *\{)(?! *SEND COB *\{).*\r?\n)*^ *PROCEDURE *)\{[^\}]*\}/

EOB regex

I've made no attempt to do the same for "cob" as that part has a different structure than "claim" and "eob" and it is not clear to me how it is to be treated.

A final note, should it not be obvious: it would be far easier to extract the strings of interest using convention code with loops and, possibly, simple regular expressions, but I hope some readers may find my answer instructive about some elements of regular expressions.

Sign up to request clarification or add additional context in comments.

2 Comments

I think Javascript does not support \G
@Thefourthbird, I too thought JavaScript didn’t support \G but I tried it at regex101 with JavaScript’s engine and it worked! I just double-checked and, oh no, PCRE! What a waste of time! I've revised my answer to take advantage of JS’s support for variable-length lookbehinds. Thanks for letting me know.
0

Will CLAIM, EOB and COB always be in the same order? If so, you can split the text before using the regex you already have:

const procRegex = /(^\s+PROCEDURE\s\{)([\S\s]*?)(?:})/gm;

let claimData = data.split("EOB")[0];
let claimProcedures = claimData.match(procRegex);

let eobData = data.split("COB")[0].split("EOB")[1];
let eobProcedures = eobData.match(procRegex);

let cobData = data.split("COB")[1];
let cobProcedures = cobData.match(procRegex);

// If you want to leave out the PRIME_EOB, you can split COB again
cobData = cobData.split("EOB")[0];
cobProcedures = cobData.match(procRegex);

console.log(claimProcedures);

Output:

[
  '       PROCEDURE { /* #1  */\n' +
    '          PROCEDURE_LINE="1"\n' +
    '          PROCEDURE_CODE="01201"\n' +
    '        \n' +
    '       }',
  '       PROCEDURE { /* #2  */\n' +
    '          PROCEDURE_LINE="2"\n' +
    '          PROCEDURE_CODE="02102"\n' +
    '         \n' +
    '       }',
  '       PROCEDURE { /* #3  */\n' +
    '          PROCEDURE_LINE="3"\n' +
    '          PROCEDURE_CODE="21222"\n' +
    '       \n' +
    '       }'
]

Demo

As an alternate method, your data is not terribly far away from valid JSON, so you could run with that. The code below translates the data into JSON, then parses it into a Javascript object that you can use however you want.

/* data cannot have Javascript comments in it for this to work, or you need
   another regex to remove them */

data = data.replace(/=/g, ":") // replace = with :
  .replace(/\s?{/g, ": {") // replace { with : {
  .replace(/SEND/g, "") // remove "SEND"
  .replace(/\"\s*$(?!\s*\})/gm, "\",") // add commas after object properties
  .replace(/}(?=\s*\w)/g, "},") // add commas after objects
  .replace(/(?<!\}),\s*PROCEDURE: /g, ",\nPROCEDURES: [") // start procedures list
  .replace(/(PROCEDURE:[\S\s]*?\})\s*(?!,\s*PROCEDURE)/g, "$1]\n") // end list
  .replace(/PROCEDURE: /g, "") // remove "PROCEDURE"
  .replace("PRIME_EOB: EOB:", "PRIME_EOB:") // replace double key with single key. Is this the behavior you want?
  .replace(/(\S*):/g, "\"$1\":") // put quotes around object key names

let dataObj = JSON.parse("{" + data + "}");

console.log(dataObj.CLAIM.PROCEDURES);

Output:

[ { PROCEDURE_LINE: '1', PROCEDURE_CODE: '01201' },
  { PROCEDURE_LINE: '2', PROCEDURE_CODE: '02102' },
  { PROCEDURE_LINE: '3', PROCEDURE_CODE: '21222' } ]

Demo

Comments

0

What you are trying to do is to write a parser for the syntax used in your text file.
If one looks at the syntax it looks much like JSON.
I would recommend to modify the syntax with regexps to get a valid JSON syntax and parse it with the JavaScript JSON parser. The parser is able to handle recursion. At the end you will have a JavaScript object that allows you to remove- or add whatever you need. In addition the hierarchy of the source will be preserved.

This code does the job for the provided example:

let data = `    SEND CLAIM {
// your text file contents
}`;

// handle PRIME_EOB=SEND EOB {
var regex = /(\w+)=\w+.*{/gm;
var replace = data.replace(regex, "$1 {");

// append double quotes in lines like PROCEDURE_LINE="1"
var regex = /(\w+)=/g;
var replace = replace.replace(regex, "\"$1\": ");

// append double quotes in lines like PROCEDURE {
var regex = /(\w+.*)\s{/g;
var replace = replace.replace(regex, "\"$1\": {");

// remove comments: /* */
var regex = /\/\**.*\*\//g;
var replace = replace.replace(regex, "");

// append commas to lines i.e. "PROCEDURE_LINE": "2"
var regex = /(\".*\":\s*\".*\")/gm;
var replace = replace.replace(regex, "$1,");

// append commas to '}'
var regex = /^.*}.*$/gm;
var replace = replace.replace(regex, "},");

// remove trailing commas
var regex = /\,(?!\s*?[\{\[\"\'\w])/g;
var replace = replace.replace(regex, "");

// surround with {}
replace = "{" + replace + "}";

console.log(replace);
var obj = JSON.parse(replace);
console.log(obj);

The JSON looks like this snippet:

{    "SEND CLAIM": {
       "PREFIX": "9403        ",
       "PROCEDURE": { 
          "PROCEDURE_LINE": "1",
          "PROCEDURE_CODE": "01201"
        
},
       "PROCEDURE": { 
          "PROCEDURE_LINE": "2",
          "PROCEDURE_CODE": "02102"

And the final object appears in the debugger like this enter image description here.

It is not completely clear to me what your final array or object should look like. But from here I expect only little effort to produce what you desire.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.