0

I got a line like this :

20:28:26.684597 24:d5:6e:76:9s:10 (oui Unknown) > 45:83:r4:7u:9s:i2 (oui Unknown), ethertype 802.1Q (0x8100), length 78: vlan 64, p 0, ethertype IPv4, (tos 0x48, ttl 34, id 5643, offset 0, flags [none], proto TCP (6), length 60) 192.168.45.28.56982 > 172.68.54.28.webcache: Flags [S], cksum 0xg654 (correct), seq 576485934, win 65535, options [mss 1460,sackOK,TS val 2544789 ecr 0,wscale 0,eol], length 0

In this line I need to find ID value from "id 5643" and another value (56982) from 192.168.45.28.56982. In these "id" will be constant and 192.168.45.28 is constant.

I have written a script like this, please suggest a way to shorten the code as in my script multiple steps are involved :

file = open('test.txt')
fi = file.readlines()

for line in fi:
    test = (line.split(","))
    for word2 in test:
        if "id" in word2:
            find2 = word2.split(" ")[-1]
            print("************", find2)
    for word in test:
        if "192.168.45.28" in word:
            find = word.split(".")
            print(find)
            for word1 in find:
                if ">" in word1:
                    find1 = word1.split(">")[0]
                    print(find1)
#
2
  • Just edited my question as per your suggestion // so for such cases 'readlines' is best suited or is there a better efficient method available. Commented Mar 13, 2016 at 10:18
  • SUre, I will do that....makes sense Commented Mar 13, 2016 at 10:44

3 Answers 3

2

Same approach as the others. It won't add empty lists to your results though, it compiles the regex for efficiency, it doesn't read the whole file into memory in one go and it doesn't use id as a variable name (it's a built-in function so best to avoid it). There can be duplicates in the output (I couldn't just assume that you wanted unique entries only).

import re

re_id = re.compile("id (\d+)")
re_ip = re.compile("192\.168\.45\.28\.(\d+)")

ids = []
ips = []

with open("test.txt", "r") as f:
    for line in f:
        id_res = re_id.findall(line)
        if any(id_res):
            ids.append(id_res[0])
        ip_res = re_ip.findall(line)
        if any(ip_res):
            ips.append(ip_res[0])
Sign up to request clarification or add additional context in comments.

Comments

2

You could use regular expressions:

import re

# This searches for the literal id
# followed by a space and 1 or more digits
idPattern = re.compile("id (\d+)")
# This searches for your IP followed by a 
# a dot and one or more digits
ipPattern = re.compile("192\.168\.45\.28\.(\d+)")

with open("test.txt", 'r') as data:
    for line in data:
        id = idPattern.findall(line)
        ip = ipPattern.findall(line)

See the Python regular expression docs

4 Comments

Got the following error "AttributeError: 'set' object has no attribute 'extend'" // But I want values to be stored in variable id1 and ip1 for every line as I need to perform some more operations on them. Could you please suggest a code for that
@dantiston Sure set() has extend? It's a list attribute. Didn't you mean set.add()?
@jDo you're right, I wrote and tested as a list and forgot to change extend when I switched to set.
@Zoro99 I updated the code to store the results at each line.
0

You can use a regex. Some more info here: https://docs.python.org/2/library/re.html

You could write it like this

import re
file = open('test.txt')
fi = file.readlines()

for line in fi:
    match = re.match('.*id (\d+).*',line)
    if match:
        print("************ %s" % match.group(1))
    match = re.match('.*192\.168\.45\.28\.(\d+).*',line)
    if match:
        print(match.group(1))

**update**

As jDo pointed out it is better to use findall, compile the regex upfront qnd dont use readlines, so you will get something like this:

import re

re_id = re.compile("id (\d+)")
re_ip = re.compile("192\.168\.45\.28\.(\d+)")
with open("test.txt", "r") as f:
    for line in f:
        match = re.findall(re_id,line)
        if match:
            print("************ %s" % match.group(1))
        match = re.findall(re_ip,line)
        if match:
            print(match.group(1))

7 Comments

It didnt give any output, though script got executed fine
I think the regex wasnt fully correct. I updated it. Quickly tested it here and should work
You're reading the whole file into memory though. As someone pointed out here "The efficient way to use readlines() is to not use it. Ever." Also, compile your regex for extra efficiency and use findall to search within strings rather than from the beginning (then you could do away with the asterisks)
You are right but he only asked for sorter code not for memory optimisation.
@BramV Well, I guess it's a matter of definition whether or not avoiding something you should almost never use can be called an "optimisation" :D
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.