0

I created a text file from multiple email messages.

Each of the three tuples below was written to the text file from a different email message and sender.

Cusip     NAME              Original Current Cashflow Collat Offering
362341D71 GSAA 2005-15 2A2   10,000   8,783  FCF       5/25  65.000
026932AC7 AHM 2007-1 GA1C    9,867    7,250  Spr Snr   OA    56.250 

Name            O/F    C/F    Cpn  FICO CAL WALB  60+    Notes             Offer
CSMC 06-9 7A1   25.00  11.97  L+45  728  26  578  35.21  FLT,AS,0.0%       50-00
LXS 07-10H 2A1  68.26  34.01  L+16  744   6  125  33.98  SS,9.57%          39-00`

CUSIP      Name               BID   x Off       SIZE   C/E    60++  WAL   ARM  CFLW
86360KAA6  SAMI 06-AR3 11A1   57-00 x 59-00     73+MM  46.9%  67.0%  65   POA  SSPT
86361HAQ7  SAMI 06-AR7 A12    19-08 x 21-08     32+MM  15.4%  61.1%  61   POA SRMEZ

By 'Name' I need a way to pull out the Price info (Price info = data under the words:'Offering','Offer' and 'Off'). This process will be replicated over the whole text file and the extracted data ('Name' and 'Price') will be written to an excel file via XLWT. Notice that the format for the price data varies by tuple.

2
  • When you write the data, are you just writing a raw text file? Or is there a format, such as tab delimiting? You don't have to use the csv library for that(it's a good idea) but unless it has a rigid format, you will only be able to use regular expressions for pulling out data. Commented Sep 16, 2011 at 20:39
  • That is the challenge, there is no rigid format. So, everything I've done so far has been with regular expressions. Commented Sep 16, 2011 at 21:10

2 Answers 2

3

The formatting for this makes it a little tricky since your names can have spaces, which can make csv difficult to use. One way to get around this is to use the first column to get the location and width of the columns you are interested by using regex. You can try something like this:

import re

for email in emails:
    print email
    lines = email.split('\n')
    name = re.search(r'name\s*', lines[0], re.I)
    price = re.search(r'off(er(ing)?)?\s*', lines[0], re.I)
    for line in lines[1:]:
        n = line[name.start():name.end()].strip()
        p = line[price.start():price.end()].strip()
        print (n, p)
    print

This assumes that emails is a list where each entry is an email. Here is the output:

Cusip     NAME              Original Current Cashflow Collat Offering
362341D71 GSAA 2005-15 2A2   10,000   8,783  FCF       5/25  65.000
026932AC7 AHM 2007-1 GA1C    9,867    7,250  Spr Snr   OA    56.250 
('GSAA 2005-15 2A2', '65.000')
('AHM 2007-1 GA1C', '56.250')

Name            O/F    C/F    Cpn  FICO CAL WALB  60+    Notes             Offer
CSMC 06-9 7A1   25.00  11.97  L+45  728  26  578  35.21  FLT,AS,0.0%       50-00
LXS 07-10H 2A1  68.26  34.01  L+16  744   6  125  33.98  SS,9.57%          39-00`
('CSMC 06-9 7A1', '50-00')
('LXS 07-10H 2A1', '39-00')

CUSIP      Name               BID   x Off       SIZE   C/E    60++  WAL   ARM  CFLW
86360KAA6  SAMI 06-AR3 11A1   57-00 x 59-00     73+MM  46.9%  67.0%  65   POA  SSPT
86361HAQ7  SAMI 06-AR7 A12    19-08 x 21-08     32+MM  15.4%  61.1%  61   POA SRMEZ
('SAMI 06-AR3 11A1', '59-00')
('SAMI 06-AR7 A12', '21-08')
Sign up to request clarification or add additional context in comments.

2 Comments

unfortunately, I indicated that each email message was a tuple, however I actually added all of the emails to one text file and I am reading them line by line - not email by email. This has made it difficult for me to apply your code...any adjustment to account for that would be appreciated.
Is each email separated by multiple newlines? If so you could do something like open('emails.txt').readlines().split('\n\n') to get an emails list similar to my example. Instead of '\n\n' you may need '\r\n\r\n' or to be safe you could do os.linesep*2.
0

Just use csv module. and use good formatting for your numbers

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.