0

How do I prepare a given txt file in bash for csv import? The given structure is like this:

Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
E-Mail
Street
City
Gender
Employment
Income
*****

As you can see the second record doesn't have an E-Mail address. Any other variation of a not given value is also possible. The values are given line by line and the records are separated by five stars.

I tried to use awk and grep in order to write the records into a separated file for csv import. How do I put the given multiply lines of a record in one line for csv and how do I keep the order if an item e.g. the telephone number is not given?

Many thanks in advance.

1
  • What happens when a field is missing ? Do you want to output an empty field or just skip it entirely ? Could you show us a sample output & the awk code you tried so far ? Commented Nov 25, 2016 at 10:01

2 Answers 2

1

Typically a .csv file has fixed records - and fields that are not included are left empty. So your first example might be:

"Mr","John","Smith","555-1212","jsmith@foo","1 St","New York","M","CSV Wrangler","5"

and your second might be:

"Mrs","Mary","Brown",,"mbrown@foo","5 St","Ottawa","F","CSV Wrangler","5"

There is no way in your input file to detect which field is missing, though. That means you won't be able to reliably create a .csv file. You need to know field names and field values to do that, unless you infer fields based on content ("This contains an @ sign so must be an email address", etc.) Even that will fail if you have a record like:

****
Homer
Springfield
****

Is that first name and surname, or first name and city? You don't have a way to tell.

Sign up to request clarification or add additional context in comments.

5 Comments

Thx for your answer. Imagine there are identifier like Surname: John, Name: Smith, Telephone: (no entry) and so on. How do i get these multiply lines in one line separated by ";" for csv and having a linebreak whenever the five stars occur?
@bashy then this can easily be done with awk or sed. Could you provide a sample input file with the correct fields & dummy values ?
@Aserre Thx 4 answering. I tried using sed like this code sed 's/^[^:]*: *([+0-9.]\+) .*/\1/' /tmp/INFILE.txt | sed -e :a -e '$!N; s/\n/,/; ta' >> /tmp/OUTFILE.txt code
My INFILE looks like this: S1 Mr S2 John S3 Doe S4 015100498127 S5 [email protected] S6 11 Elmstreet S7 Somewhere 61504 S8 01-01-2017 S9 two adults S10 Cat S11 Employee S12 2000 ***** S1 Mrs S2 Ruth S3 Butterfinger S4 015283055471 S5 [email protected] S6 123 Sample Drive S7 Rockford 61109 S8 02-01-2017 S9 one Family S10 Dog S11 Student S12 1500 *****
Sorry, there is no linebreak. every record which starts with S1, S2, S3 ... is a separate line.
0

To make the problem more clearly:
I do get email requests which contain lots of unneeded stuff.
So I export the *.eml files to /tmp directory.
I collect the needed information like this into one file called Input.txt.
My code looks like this:

#!/bin/bash
touch /tmp/Input.txt
OUTFILE=/tmp/Input.txt
cat /dev/null > "$OUTFILE"
FILES=/tmp/*.eml
for f in $FILES
do
	grep 'Salutation         :' "$f" | sed 's/^.*: //' | perl -ne 'print "S1 $_"' >> "$OUTFILE"
  	grep 'Surname            :' "$f" | sed 's/^.*: //' | perl -ne 'print "S2 $_"' >> "$OUTFILE"
  	grep 'Name               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S3 $_"' >> "$OUTFILE"
  	grep 'Telephone          :' "$f" | sed 's/^.*: //' | perl -ne 'print "S4 $_"' >> "$OUTFILE"
  	grep 'E-Mail             :' "$f" | sed 's/^.*: //' | perl -ne 'print "S5 $_"' >> "$OUTFILE"
  	grep 'Street             :' "$f" | sed 's/^.*: //' | perl -ne 'print "S6 $_"' >> "$OUTFILE"
  	grep 'City               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S7 $_"' >> "$OUTFILE"
  	grep 'Date               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S8 $_"' >> "$OUTFILE"
  	grep 'Size               :' "$f" | sed 's/^.*: //' | perl -ne 'print "S9 $_"' >> "$OUTFILE"
  	grep 'Animals            :' "$f" | sed 's/^.*: //' | perl -ne 'print "S10 $_"' >> "$OUTFILE"
  	grep 'Employment         :' "$f" | sed 's/^.*: //' | perl -ne 'print "S11 $_"' >> "$OUTFILE"
  	grep 'Income             :' "$f" | sed 's/^.*: //' | perl -ne 'print "S12 $_"' >> "$OUTFILE"
  	echo "*****" >> "$OUTFILE"
done

Finally I get the OUTFILE Input.txt like this:

S1 Mr
S2 John
S3 Smith
S4 1514009855
S5 [email protected]
S6 11 Elm Street
S7 Denver
S8 05/21/2016
S9 66
S10 Cat
S11 Officer
S12 20
*****
S1 Mrs
S2 Mary
S3 Wood
S4 65223457
S5 [email protected]
S6 60 Taft Ave.
S7 Boston
S8 04/26/2016
S10 Dog
S11 Secretary
S12 10
*****
S1 Mrs
S2 Lori
S3 White
S4 56325478
S6 730 Vista del Playa
S7 Anaheim
S8 01/22/2016
S10 Fish
S11 Teacher
S12 80
*****


So the first record is complete S1 till S12.
In the second record there is S9 missing and in the third one there is S5 and S9 missing.
The aim is to get these records read out from Input.txt and put them into a csv-file.
The csv should look like this, considering the missing items:
Salutation,Surname,Name,Telephone,E-Mail,Street,City,Date,Size,Animals,Employment,Income
Mr;John;Smith;1514009855;[email protected];11ElmStreet;Denver;05/21/2016;66;Cat;Officer;20
Mrs;Mary;Wood;65223457;[email protected];60TaftAve.;Boston;04/26/2016;;Dog;Secretary;10
Mrs;Lori;White;56325478;;730VistadelPlaya;Anaheim;01/22/2016;;Fish;Teacher;80

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.