1

I'm having some trouble with the output I am receiving on this problem. Basically, I have a text file (https://www.py4e.com/code3/mbox.txt) and I am attempting to first have python print how many email addresses are found in it and then print each of those addresses on subsequent lines. A sample of my output is looking like this:

Received: (from apache@localhost)

There were 22003 email addresses in mbox.txt
    for [email protected]; Thu, 18 Oct 2007 11:31:49 -0400

There were 22004 email addresses in mbox.txt

X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender to [email protected] using -f

There were 22005 email addresses in mbox.txt

What am I doing wrong here? Here's my code

fhand = open('mbox.txt')
count = 0
for line in fhand:
    line = line.rstrip()
    if '@' in line:
        count = count + 1
        print('There were', count, 'email addresses in mbox.txt')
    if '@' in line:
        print(line)

2 Answers 2

1

The following modifies your code to use a regular expression to find emails in text lines.

import re

# Pattern for email 
# (see https://www.geeksforgeeks.org/extracting-email-addresses-using-regular-expressions-python/)

pattern = re.compile(r'\S+@\S+')

with open('mbox.txt') as fhand:
  emails = []
  for line in fhand:
      # Detect all emails in line using regex pattern
      found_emails = pattern.findall(line)
      if found_emails:
        emails.extend(found_emails)

print('There were', len(emails), 'email addresses in mbox.txt')
if emails:
  print(*emails, sep="\n")

Output

There were 44018 email addresses in mbox.txt
[email protected]
<[email protected]>
<[email protected]>
<[email protected]>;
<[email protected]>;
<[email protected]>;
apache@localhost)
[email protected];
[email protected]
[email protected]
....
....
...etc...
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much for this. Darryl u a G
0

Can you make it clearer what your expected output is compared to your actual output?

You have two if '@' in line' statements that should be combined; there's no reason to ask the same question twice.

You count the number of lines that contain an @ symbol and then per line, print the current count.

If you want to only print the count once, then put it outside (after) your for loop.

If you want to print the email addresses and not the whole lines that contain them, then you'll need to do some more string processing to extract the email from the line.

Don't forget to close your file when you've finished with it.

4 Comments

Sorry, I am aiming for the output to be: "There were __ email addresses in mbox.txt" then print each email address on subsequent lines
Do update your question to state this. Currently, it says "My whole output", but I think it's not your whole output. My answer matches what you want, I believe.
Thanks for that, I have updated it. I put both of the print lines outside of the loop but now I am only getting the amount of addresses in the output. You also said I would need to do some more string processing to extract the email from the line. What does this look like?
One way to work on something like this is to break it down and work on one piece at a time (to help you focus). So, rather than getting the whole loop to work, get the line processing to work for one line, like: line = "Received: (from apache@localhost)" then do what you can to get that to work. Note that just because a line has an @ doesn't mean it's a valid email address. This example from your output isn't a valid public email.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.