0

Write a program to prompt for a file name, and then read through the file and look for lines of the form: X-DSPAM-Confidence: 0.8475 When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the floating point number on the line. Count these lines and the compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.

Enter the file name: mbox.txt
Average spam confidence: 0.894128046745

Enter the file name: mbox-short.txt
Average spam confidence: 0.750718518519 Test your file on the mbox.txt and mbox-short.txt files.

So far I have:

 fname = raw_input("Enter file name: ")
 fh = open(fname)
 for line in fh:
     pos  = fh.find(':0.750718518519')
     x = float(fh[pos:])
     print x

What is wrong with this code?

2 Answers 2

4

It sounds like they're asking you to average all the 'X-DSPAM-Confidence' numbers, rather than find 0.750718518519.

Personally, I'd find the word you're looking for, extract the number, then put all these numbers into a list and average them at the end.

Something like this -

# Get the filename from the user
filename = raw_input("Enter file name: ")

# An empty list to contain all our floats
spamflts = []

# Open the file to read ('r'), and loop through each line
for line in open(filename, 'r'):

    # If the line starts with the text we want (with all whitespace stripped)
    if line.strip().startswith('X-DSPAM-Confidence'):

        # Then extract the number from the second half of the line
        # "text:number".split(':') will give you ['text', 'number']
        # So you use [1] to get the second half
        # Then we use .strip() to remove whitespace, and convert to a float
        flt = float(line.split(':')[1].strip())

        print flt

        # We then add the number to our list
        spamflts.append(flt)

print spamflts
# At the end of the loop, we work out the average - the sum divided by the length
average = sum(spamflts)/len(spamflts)

print average

>>> lines = """X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4"""

>>> for line in lines.splitlines():
    print line


X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4

Using find:

>>> for line in lines.splitlines():
    pos = line.find('X-DSPAM-Confidence:')
    print pos

0
0
-1
0

We can see that find() just gives us the position of 'X-DSPAM-Confidence:' in each line, not the position of the number after it.

It's easier to find if a line starts with 'X-DSPAM-Confidence:', then extract just the number like this:

>>> for line in lines.splitlines():
    print line.startswith('X-DSPAM-Confidence')


True
True
False
True

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print line.split(':')


['X-DSPAM-Confidence', ' 1']
['X-DSPAM-Confidence', ' 5']
['X-DSPAM-Confidence', ' 4']

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print float(line.split(':')[1])


1.0
5.0
4.0
Sign up to request clarification or add additional context in comments.

14 Comments

heh you just did his assignment for him practically :P (still +1 for giving a right answer)
@JoranBeasley Yeah, I guess so. It's probably not the best way to learn, but hopefully he'll read through and try and understand it. (There's no homework tag any more is there?)
@JoranBeasley I've added a few comments that should help him understand it
still a little confused i dont believe spamflts is defined anywhere?
however if I wanted to prompt the user to enter the file without declaring an array how would i do it? or can i do it like this: x = raw_input("File name:") and then spamflts [] = x
|
-1

line.find #..... so you search the line ....

print pos #prints help with debugging ;)

float(fh[pos+1:]) #the index you got is actually the : so you need to move over 1 more

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.