0

I have example:

    for line in IN.readlines():
        line = line.rstrip('\n')
        mas = line.split('\t')
        row = ( int(mas[0]), int(mas[1]), mas[2], mas[3], mas[4] )
        self.inetnums.append(row)
    IN.close()

If ffilesize == 120mb, script time = 10 sec. Can I decrease this time ?

2
  • 3
    You're reading a 120GB file into memory? How much memory does your machine have? Commented Apr 5, 2012 at 10:36
  • 1
    What hard drive does 12GB/sec? Commented Apr 5, 2012 at 10:41

2 Answers 2

4

Remove the readlines()

Just do

for line in IN:

Using readlines you are creating a list of all lines from the file and then accessing each one, which you don't need to do. Without it the for loop simply uses the generator which returns a line each time from the file.

Sign up to request clarification or add additional context in comments.

Comments

2

You may gain some speed if you use a List Comprehension

inetnums=[(int(x) for x in line.rstrip('\n').split('\t')) for line in fin]

Here is the profile information with two different versions

>>> def foo2():
    fin.seek(0)
    inetnums=[]
    for line in fin:
        line = line.rstrip('\n')
        mas = line.split('\t')
        row = ( int(mas[0]), int(mas[1]), mas[2], mas[3])
        inetnums.append(row)


>>> def foo1():
    fin.seek(0)
    inetnums=[[int(x) for x in line.rstrip('\n').split('\t')] for line in fin]

>>> cProfile.run("foo1()")
         444 function calls in 0.004 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.003    0.003    0.004    0.004 <pyshell#362>:1(foo1)
        1    0.000    0.000    0.004    0.004 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      220    0.000    0.000    0.000    0.000 {method 'rstrip' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'seek' of 'file' objects}
      220    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}


>>> cProfile.run("foo2()")
         664 function calls in 0.006 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    0.006    0.006 <pyshell#360>:1(foo2)
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)
      220    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      220    0.001    0.000    0.001    0.000 {method 'rstrip' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'seek' of 'file' objects}
      220    0.001    0.000    0.001    0.000 {method 'split' of 'str' objects}


>>> 

2 Comments

Would you actually gain some speed by using list comps, other than the speed gained by removing readlines? It seems to me like its just another way of writing the same code.
@jamylak: Consider the fact that you won't be calling append multiple times in a loop. I have updated my answer with information from cProfile.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.