2

I have a log file in text file format. the log file looks like the below format

220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html 
HTTP/1.1" 404 0 - -
220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 
204 214 - -
59.95.13.217 - - [06/Mar/2012:00:00:00 -0800] "GET /dbupdates2.xml HTTP/1.1" 
404 0 - -

111.92.9.222 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html 
HTTP/1.1" 404 0 - -
120.56.236.46 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 
204 214 - -
49.138.106.21 - - [06/Mar/2012:00:00:00 -0800] "GET /add.txt HTTP/1.1" 204 
214 - -

117.195.185.130 - - [06/Mar/2012:00:00:00 -0800] "GET 
/mysidebars/newtab.html HTTP/1.1" 404 0 - -
122.160.166.220 - - [06/Mar/2012:00:00:00 -0800] "GET 
/mysidebars/newtab.html HTTP/1.1" 404 0 - -
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /welcome.html HTTP/1.1" 
204 212 - -
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html 
HTTP/1.1" 404 0 - -

I want to find each unique ip address present in the log file using python.

4
  • Why bother with python when perl -lane 'print $F[0] unless $seen{$F[0]}++' logfile1 logfile2 logfile3 does the job for you already? Commented Mar 9, 2012 at 3:24
  • @tchrist that should be expanded into an answer Commented Mar 9, 2012 at 4:29
  • @tchrist but my requirement is on python. Commented Mar 9, 2012 at 4:33
  • 1
    Why use perl when $ sort -uk1,1 does the job already? Commented Mar 9, 2012 at 5:34

2 Answers 2

2

Here is how :

def unique_ips():
    f = open('log_file.txt','r')
    ips = set()
    for line in f:
        ip = line.split()[0]
        ips.add(ip)
    return ips

if __name__=='__main__':
    print unique_ips()

This should work fine with python 2.6.

Sign up to request clarification or add additional context in comments.

5 Comments

ip not in ips will become quite slow if there are a lot of different ip addresses. ips should be a set
now you see you can just write ips = set(line.split()[0] for line in f). split()[0] will break if there are any empty lines though
Well, I know that could also be done, but its not about saving lines here. I wanted to be more clearer.
ips = set(line.split()[0] for line in f if not line.isspace()) would be better
thanks and @gnibbler, its working fine. i have 243607 ips in the log file. the output is displaying continuously so that i can't able to check the output. I want each ip to be print in seprate line. as i'm new to python i can't able to figure it out. is there any way to do it?
2

How about:

def get_ips(logfile):
    with open(logfile, 'r') as f:
        for line in f.readlines():
            yield line.split()[0]


def main():
    for ip in set(get_ips('log.txt')):
        print ip


if __name__ == '__main__':
    main()

3 Comments

@Raju.allen, which version of Python are you using?
You can just use for line in f:. It's better because it avoids reading the whole file into memory at once
@Raju.allen, that code should work fine in Python2.6. Did you copy/paste it or retype it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.