1

I have a text file that looks like this

127.0.0.1
  159.187.32.13, 3:00:15, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 151.177.45.0/27 [20/0] via 190.150.1.2
    Outgoing interface list:
      Vlan4054
  159.187.32.20, 2:20:11, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 151.177.45.59/27 [20/0] via 190.150.1.2
    Outgoing interface list:
      Vlan4054
      Vlan4056
  198.140.45.77, 2:36:15, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 151.177.45.88/27 [20/0] via 190.150.1.2
    Outgoing interface list:
      Vlan4054
127.0.0.2
  188.125.45.13, 3:00:15, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 199.150.45.0/27 [20/0] via 195.32.1.2
    Outgoing interface list:
      Vlan4054
      Vlan4056
  221.125.45.77, 2:20:11, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 199.150.45.10/27 [20/0] via 195.32.1.2
    Outgoing interface list:
      Vlan4054
      Vlan4056

I'm trying to create a dictionary of the data so it is parseable, currently attempting to do so via regex

import re

content = []
content_dict = {}

group_ip = re.compile("^(\d+\.\d+\.\d+\.\d+$)")
ip_subnet = re.compile("^(\d+\.\d+\.\d+\.\d+\/+\d+)")
two_space_start = re.compile("^( {2})\S")
four_space_start = re.compile("^( {4})\S")
six_space_start = re.compile("^( {6})\S")

I had planned on applying regex to each line and creating a dictionary like below

if group_ip.match(line):
    content_dict["group"] = line.strip()

elif two_space.match(line) and "RP" in line:
    line = line.split(",")

    content_dict["source"] = line[0].strip()
    content_dict["uptime"] = line[1].strip()
    content_dict["rp"] = line[2].split(" ")[-1]
    content_dict["source_flags"] = line[-1].split(":")[-1].strip()

content.append(copy.copy(content_dict))

But have realised that this won't work on scale as each group IP (127.0.0.1, 127.0.0.2) will have a variable amount of subgroups that I am overwriting. What I'm trying to get to is something along the lines of

"127.0.0.1": [
    "159.187.32.13": [
        "uptime": "3:00:15",
        "flags": "S",
        "rpf_ip": "151.177.45.0/27",
        "via": "190.150.1.2",
        "outgoing_interface": ["vlan4054"]
        ],
    "159.187.32.20": [
       "uptime": "2:20:11",
        "flags": "S",
        "rpf_ip": "151.177.45.59/27",
        "via": "190.150.1.2",
        "outgoing_interface": ["Vlan4054", "Vlan4056"]
        ]
    ]

Is it possible to get this data structure from the text through regex or some other way?

1
  • This is a JSON format. However, the key is dynamic (e.g., "127.0.0.1", "127.0.0.2"). This would be difficult to use this structured data. Commented Sep 18, 2018 at 8:35

1 Answer 1

1

Since the input is fairly easy to tokenize, regex may be overkill. You can instead use str.startswith, str.isdigit and str.split for your purpose:

from pprint import pprint
content = {}
with open('file.txt', 'r') as f:
    for line in f:
        line = line.rstrip()
        if line[0].isdigit():
            group = line
            content[group] = {}
        elif line.startswith('  ') and line[2].isdigit():
            ip, uptime, flags = line.lstrip().split(', ')
            _, flags = flags.split()
            content[group][ip] = {'uptime': uptime, 'flags': flags, 'outgoing_interface': []}
        elif line.startswith('    RPF route:'):
            _, _, _, rpf_ip, _, _, via = line.split()
            content[group][ip]['rpf_ip'] = rpf_ip
            content[group][ip]['via'] = via
        elif line.startswith('      '):
            content[group][ip]['outgoing_interface'].append(line.lstrip())
pprint(content)

This outputs (with your sample input):

{'127.0.0.1': {'159.187.32.13': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054'],
                                 'rpf_ip': '151.177.45.0/27',
                                 'uptime': '3:00:15',
                                 'via': '190.150.1.2'},
               '159.187.32.20': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054', 'Vlan4056'],
                                 'rpf_ip': '151.177.45.59/27',
                                 'uptime': '2:20:11',
                                 'via': '190.150.1.2'},
               '198.140.45.77': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054'],
                                 'rpf_ip': '151.177.45.88/27',
                                 'uptime': '2:36:15',
                                 'via': '190.150.1.2'}},
 '127.0.0.2': {'188.125.45.13': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054', 'Vlan4056'],
                                 'rpf_ip': '199.150.45.0/27',
                                 'uptime': '3:00:15',
                                 'via': '195.32.1.2'},
               '221.125.45.77': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054', 'Vlan4056'],
                                 'rpf_ip': '199.150.45.10/27',
                                 'uptime': '2:20:11',
                                 'via': '195.32.1.2'}}}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.