1

I have a large text file (60Mb) that looks like the following:

:VPN ()
:add_adtr_rule (true)
:additional_products ()
:addr_type_indication (IPv4)
:certificates ()
:color (black)
:comments ()
:connectra (false)
:connectra_settings ()
:cp_products_installed (false)
:data_source (not-installed)
:data_source_settings ()
:edges ()
:enforce_gtp_rate_limit (false)
:firewall (not-installed)
:floodgate (not-installed)
:gtp_rate_limit (2048)
:interfaces ()
:ipaddr (10.19.45.18)

for every instance in which :add_adtr_rule is true, there are thousands of ':add_adtr_rule (false)' entries, I need the value of the ipaddr - so in this instance I would need the 10.19.45.18. How can I use a regex to extract this information.

I have tried the following code, that returns an empty list:

import re

with open("objects_5_0_C-Mod.txt", "r") as f:
    text = f.read()

ip=re.findall(r':add_adtr_rule [\(]true[\)]\s+.*\s+.*\s+.*\s+.*\s+:ipaddr\s+[\(](.*)[\)]', text)
print(ip) 
1
  • Assuming that the file consists of repeated blocks like the above, and given that I am not a regex expert, I would have started by writing a generator yielding one block at a time. This would be generically useful for querying the file. I would have then tested for the 'true' and extracted or ignored depending. But siam's regex looks good for a one-off job. Commented Mar 6, 2017 at 20:20

2 Answers 2

3

The following regex should do it :

(?s)(?:add_adtr_rule\s\(true\)).*?:ipaddr\s\((.*?)\)

see regex demo / explanation

python ( demo )

import re

s = """:VPN () :add_adtr_rule (true) :additional_products () :addr_type_indication (IPv4) :certificates () :color (black) :comments () :connectra (false) :connectra_settings () :cp_products_installed (false) :data_source (not-installed) :data_source_settings () :edges () :enforce_gtp_rate_limit (false) :firewall (not-installed) :floodgate (not-installed) :gtp_rate_limit (2048) :interfaces () :ipaddr (10.19.45.18)"""
r = r"(?s)(?:add_adtr_rule\s\(true\)).*?:ipaddr\s\((.*?)\)"
ip = re.findall(r, s)
print (ip)
Sign up to request clarification or add additional context in comments.

3 Comments

Nice. By working the your regex with the manual, docs.python.org/3/library/re.html#regular-expression-syntax, I learned some new features.
@Clyde Glad it worked! BTW, an accept would be much appreciated tho :-)
@Siam - What do you mean by accept? Sorry I'm fairly new to this site.
1

You might want to add anchors to speed up things. Consider the following example with MULTILINE and VERBOSE turned on:

^:add_adtr_rule\ \(true\)   # start of line, followed by :add_ ...
[\s\S]+?                    # everything else afterwards, lazily          
^:ipaddr\ \((?P<ip>[^)]+)\) # start of line, ip and group "ip" between ()

See a demo on regex101.com.


With your given code this comes down to:

import re

rx = re.compile(r'''
        ^:add_adtr_rule\ \(true\)
        [\s\S]+?
        ^:ipaddr\ \((?P<ip>[^)]+)\) 
        ''', re.MULTILINE | re.VERBOSE)

with open("objects_5_0_C-Mod.txt", "r") as f:
    text = f.read()

ips = [match.group('ip') for match in rx.finditer(text)]
print(ips) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.