0

I have a file that has lines that look like this

LINEID1:FIELD1=ABCD,&FIELD2-0&FIELD3-1&FIELD4-0&FIELD9-0;
LINEID2:FIELD1=ABCD,&FIELD5-1&FIELD6-0;
LINEID3:FIELD1=ABCD,&FIELD7-0&FIELD8-0;

LINEID1:FIELD1=XYZ,&FIELD2-0&FIELD3-1&FIELD9-0
LINEID3:FIELD1=XYZ,&FIELD7-0&FIELD8-0;

LINEID1:FIELD1=PQRS,&FIELD3-1&FIELD4-0&FIELD9-0;
LINEID2:FIELD1=PQRS,&FIELD5-1&FIELD6-0;
LINEID3:FIELD1=PQRS,&FIELD7-0&FIELD8-0;

I'm interested in only the lines that begin with LINEID1 and only some elements (FIELD1, FIELD2, FIELD4 and FIELD9) from that line. The output should look like this (no & signs.can replace with |)

FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0;
FIELD1=PQRS|FIELD4-0|FIELD9-0;

If additional information is required, do let me know, I'll post them in edits. Thanks!!

2
  • Looks like you want FIELD9, not FIELD5? And XYZ in data doesn't match WXYZ in output. Commented Aug 30, 2014 at 7:57
  • Sorry yes you're right. Will correct. Thanks! Commented Aug 30, 2014 at 8:12

3 Answers 3

4

This is not exactly what you asked for, but no-one else is answering and it is pretty close for you to get started with!

awk -F'[&:]' '/^LINEID1:/{print $2,$3,$5,$6}' OFS='|' file

Output

FIELD1=ABCD,|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ,|FIELD2-0|FIELD9-0|
FIELD1=PQRS,|FIELD3-1|FIELD9-0;|

The -F sets the Input Field Separator to colon or ampersand. Then it looks for lines starting LINEID1: and prints the fields you need. The OFS sets the Output Field Separator to the pipe symbol |.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! a sed to replace the ','s and the file is good to go !
You can make the semicolons disappear by telling awk they are field separators, so change the -F'[&:]' to -F'[&:;]'
Oh! Thanks for that.. my knowledge in UNIX is negligible.. i used sed to get rid of them :D
2

Pure awk:

awk -F ":" ' /LINEID1[^0-9]/{gsub(/FIELD[^1249]+[-=][A-Z0-9]+/,"",$2); gsub(/,*&+/,"|",$2); print $2} ' file

Updated to give proper formatting and to omit LINEID11, etc...

Output:

FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0
FIELD1=PQRS|FIELD4-0|FIELD9-0;

Explanation:

awk -F ":" - split lines into LHS ($1) and RHS ($2) since output only requires RHS

/LINEID1[^0-9]/ - return only lines that match LINEID1 and also ignores LINEID11, LINEID100 etc...

gsub(/FIELD[^1249]+[-=][A-Z0-9]+/,"",$2) - remove all fields that aren't 1, 4 or 9 on the RHS

gsub(/,*&+/,"|",$2) - clean up the leftover delimiters on the RHS

1 Comment

This will also find lines with LINEID10, LINEID11 LINEID199.
1

To select rows from data with Unix command lines, use grep, awk, perl, python, or ruby (in increasing order of power & possible complexity).

To select columns from data, use cut, awk, or one of the previously mentioned scripting languages.

First, let's get only the lines with LINEID1 (assuming the input is in a file called input).

grep '^LINEID1' input

will output all the lines beginning with LINEID1.

Next, extract the columns we care about:

grep '^LINEID1' input |   # extract lines with LINEID1 in them
cut -d: -f2           |   # extract column 2 (after ':')
tr ',&' '\n\n'        |   # turn ',' and '&' into newlines
egrep 'FIELD[1249]'   |   # extract only fields FIELD1, FIELD2, FIELD4, FIELD9
tr '\n' '|'           |   # turn newlines into '|'
sed -e $'s/\\|\\(FIELD1\\)/\\\n\\1/g' -e 's/\|$//'

The last line inserts newlines in front of the FIELD1 lines, and removes any trailing '|'.

That last sed pattern is a little more challenging because sed doesn't like literal newlines in its replacement patterns. To put a literal newline, a bash escape needs to be used, which then requires escapes throughout that string.

Here's the output from the above command:

FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0
FIELD1=PQRS|FIELD4-0|FIELD9-0;

This command took only a couple of minutes to cobble up.

Even so, it's bordering on the complexity threshold where I would shift to perl or ruby because of their excellent string processing.

The same script in ruby might look like:

#!/usr/bin/env ruby
#
while line = gets do
  if line.chomp =~ /^LINEID1:(.*)$/
    f1, others = $1.split(',')
    fields = others.split('&').map {|f| f if f =~ /FIELD[1249]/}.compact
    puts [f1, fields].flatten.join("|")
  end
end

Run this script on the same input file and the same output as above will occur:

$ ./parse-fields.rb < input
FIELD1=ABCD|FIELD2-0|FIELD4-0|FIELD9-0;
FIELD1=XYZ|FIELD2-0|FIELD9-0
FIELD1=PQRS|FIELD4-0|FIELD9-0;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.