Extract Multiple Matching Patterns from Text File

Question

I need to extract all K[A-Z]{4} and US[C,W][0-9]{8} values from every line in a text file.

I am using the below code to try and achieve this but, I need to extract these values, based on the condition when ONLY both are present in a given line (i.e., the last three lines in the below data).

Attempted Code:

#Filters out any values matching K[A-Z]{4}
grep -Po '"\K[A-Z]{4}\b' usc.matched > out.1

#Filters out any values matching US[C,W][0-9]{8}
grep -Po '\bUS\w*' usc.matched > out.2

#Pastes two datasets together, separated by a comma
paste -d',' out.1 out.2 > stations.filtered

#Removes any lines that do not lead with "K"
sed -i '/^[^K]/d' stations.filtered

JSON Data:

{"sids": ["94737 1", "RUT 3", "KRUT 5"], "name": "RUTLAND STATE AP"},
{"sids": ["54740 1", "VSF 3", "KVSF 5", "USW00054740 6"], "name": "SPRINGFIELD HARTNESS AP"},
{"sids": ["94601 1", "RKD 3", "KRKD 5"], "name": "ROCKLAND KNOX CO RGNL AP"},
{"sids": ["20B 3"], "name": "ROCKLAND STN"},
{"sids": ["177250 2", "USC00177250 6"], "name": "ROCKLAND"},
{"sids": ["177255 2", "USC00177255 6", "RCKM1 7"], "name": "ROCKLAND"},
{"sids": ["177260 2"], "name": "ROCKLAND MOORING LBS"},
{"sids": [], "name": "ROCKLAND"},
{"sids": ["14612 1"], "name": "ROCKLAND"},
{"sids": ["274380 2", "USC00274380 6"], "name": "KEARSARGE"},
{"sids": ["192770 2", "USC00192770 6"], "name": "FISKDALE"},
{"sids": ["US1CTNL0005 6", "CTNL0005 10"], "name": "OAKDALE 2.6 WNW"},
{"sids": ["063989 2", "USC00063989 6"], "name": "LAKE KONOMOC"},
{"sids": ["14740 1", "14721 1", "063456 2", "069704 2", "BDL 3", "72508 4", "KBDL 5", "USW00014740 6", "BDL 7"], "name": "HARTFORD-BRADLEY INTL AP"},
{"sids": ["94702 1", "060806 2", "BDR 3", "72504 4", "KBDR 5", "USW00094702 6", "BDR 7"], "name": "IGOR I SIKORSKY MEMORI AP"},
{"sids": ["54734 1", "DXR 3", "KDXR 5", "USW00054734 6"], "name": "DANBURY MUNI AP"},

Current Output:

KRUT,
KVSF,USW00054740
KRKD

USC00177250
USC00177255


USC00274380
USC00192770
US1CTNL0005
USC00063989
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734

Expected Output:

KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734

Yes, I have jq installed. i have also appended Current Output and Expected Output to help further define what I am trying to do — arnpry
– arnpry, Commented Jul 11, 2017 at 10:08

anubhava · Accepted Answer · 2017-07-11 10:28:50Z

2

You can use:

awk -F '[][" \t{},:]+' '{
a=b=""
for(i=2; i<=NF; i++)
   if ($i ~ /^K[A-Z]{3}$/)
      a=$i
   else if ($i ~ /^US[CW][0-9]+/)
      b=$i
   if (a != "" && b != "")
      print a, b
}' OFS=, file

KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734

answered Jul 11, 2017 at 10:28

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sundeep · Accepted Answer · 2017-07-11 10:35:44Z

2

if perl is okay (assumes K string precedes US string in same line)

$ perl -lne 'print "$1,$2" if /"(K[A-Z]{3})\b.*"(US[CW]\d{8}\b)/' usc.matched 
KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734

if /"(K[A-Z]{3})\b.*"(US[CW]\d{8}\b)/ only if this condition matches
- print "$1,$2" print the two captured groups
- "(K[A-Z]{3})\b matches K followed by three uppercase letters only if preceded by " and ending with word boundary
- "(US[CW]\d{8}\b matches US followed by C or W and eight digits only if preceded by " and ending with word boundary
See http://perldoc.perl.org/perlrun.html#Command-Switches for details on -lne options

answered Jul 11, 2017 at 10:35

Sundeep

23.9k2 gold badges35 silver badges131 bronze badges

Comments

James Brown · Accepted Answer · 2017-07-11 10:16:38Z

1

In awk. Tune the regexen to your liking:

$ awk -v OFS=, '
/K[A-Z]{3} / && /US[C,W][0-9]{8}/ {
    b=""
    while(match($0,/K[A-Z]{3} |US[C,W][0-9]{8}/)) {
        b=b (b==""?"":OFS) substr( $0, RSTART, RLENGTH)
        $0=substr($0,RSTART+RLENGTH)
    } 
print b}' file
KVSF ,USW00054740
KBDL ,USW00014740
KBDR ,USW00094702
KDXR ,USW00054734

answered Jul 11, 2017 at 10:16

James Brown

37.7k8 gold badges52 silver badges64 bronze badges

Collectives™ on Stack Overflow

Extract Multiple Matching Patterns from Text File

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related