0

I have one relatively small issue, but I can't keep to wrap my head around it. I have a text file which has information about a graph, and the structure is as follows:

  • first line contains the number of nodes
  • a blank line is used for separation
  • information about nodes follows, each chunk is separated from another by the empty line
  • chunks contain the node id one one line, type on second, and information about edges follows
  • there are two types of edges, up and down, and first number after node types denotes number of "up" edges, and their IDs follow in line after (if that number is 0, no "up" edges exist and the next number denotes the "down" edges)
  • same goes for the "down" edges, number of them and their ids in line below

So, sample data with two nodes is:

3

1
1
2
2 3
0

2
1
0
2
1 3

3
2
1
1
1
2

So, node 1 has type 1, two up edges, 2 and 3, and no down edges. Node 2 has type 1, zero up edges, and 2 down edges, 1 and 3 Node 3 has type 2, one up edge, 1, and 1 down edge, 2.

This info is clearly readable by human, but I am having issues writing a parser to take this information and store it in usable form.

I have written a sample code:

f = open('C:\\data', 'r')
lines = f.readlines()
num_of_nodes = lines[0]
nodes = {}
counter = 0
skip_next = False
for line in lines[1:]:
    new = False
    left = False
    right = False
    if line == "\n":
        counter += 1
        nodes[counter] = []
        new = True
        continue
    nodes[counter].append(line.replace("\n", ""))

Which kinda gets me the info split for each node. I would like something like a dictionary, which would hold the ID, up and down neighbors for each (or False if there are none available). I suppose that I could now parse through this list of nodes again and do each on its own, but I am wondering can I modify this loop I have to do that nicely in the first place.

5
  • 3
    your definition of clearly readable by human is different from mine, but I'm thinking of a solution for your problem Commented Dec 15, 2013 at 20:47
  • Haha, well, I have definitely read some more readable things in my life, but I was trying to say that the data structure is "defined", meaning that when I look at the series of number I can represent that in my mind easily, node id, its type and neighbors (if it has them) on each side. This clause, "if it has them", seems the critical part here which I can't describe in code. Commented Dec 15, 2013 at 20:50
  • Could you consider giving your question a less vague title than "parse text file with Python"? Something specific to the data you're trying to read. Commented Dec 15, 2013 at 20:51
  • could you provide an outline of the dict your looking for? Commented Dec 15, 2013 at 20:54
  • @puredevotion Something like this nodes = { node_id: {ups: [], downs:[]} or something in that method. Commented Dec 15, 2013 at 21:22

2 Answers 2

2

Is that what you want ?

{1: {'downs': [], 'ups': [2, 3], 'node_type': 1}, 
 2: {'downs': [1, 3], 'ups': [], 'node_type': 1}, 
 3: {'downs': [2], 'ups': [1], 'node_type': 2}}

Then here's the code:

def parse_chunk(chunk):
    node_id = int(chunk[0])
    node_type = int(chunk[1])

    nb_up = int(chunk[2])
    if nb_up:
        ups = map(int, chunk[3].split())
        next_pos = 4
    else:
        ups = []
        next_pos = 3

    nb_down = int(chunk[next_pos])
    if nb_down:
        downs = map(int, chunk[next_pos+1].split())
    else:
        downs = []

    return node_id, dict(
        node_type=node_type,
        ups=ups,
        downs=downs
        )

def collect_chunks(lines):
    chunk = []
    for line in lines:
        line = line.strip()
        if line:
            chunk.append(line)
        else:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

def parse(stream):
    nb_nodes = int(stream.next().strip())
    if not nb_nodes:
        return []
    stream.next()
    return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))

def main(*args):
    with open(args[0], "r") as f:
        print parse(f)

if __name__ == "__main__":
    import sys
    main(*sys.argv[1:])
Sign up to request clarification or add additional context in comments.

2 Comments

This is perfect! I only modified this a bit, as I need slightly different output for further processing, so I added this: nodes = {} for node in list_of_nodes: nodes[node['node_id']] = {'type': node['node_type'], 'right': node['right'], 'left': node['left']}
Its perfect now! Thank you for blazingly fast answer and for your time and help!
1

I would do it as presented below. I would add a try-catch around file-reading, and read your files with the with-statement

nodes = {}
counter = 0
with open(node_file, 'r', encoding='utf-8') as file:
     file.readline()                              # skip first line, not a node
     for line in file.readline():
         if line == "\n":
             line = file.readline()               # read next line
             counter = line[0]
             nodes[counter] = {}                  # create a nested dict per node
             line = file.readline() 
             nodes[counter]['type'] = line[0]     # add node type
             line = file.readline()
             if line[0] != '0':
                 line = file.readline()           # there are many ways
                 up_edges = line[0].split()       # you can store edges
                 nodes[counter]['up'] = up_edges  # here a list
                 line = file.readline()
             else: 
                 line = file.readline()
             if line[0] != '0':
                 line = file.readline()
                 down_edges = line[0].split()     # store down-edges as a list  
                 nodes[counter]['down'] = down_edges  
             # end of chunk/node-set, let for-loop read next line
         else:
              print("this should never happen! line: ", line[0])

This reads the files per line. I'm not sure about your data-files, but this is easier on your memory. IF memory is an issue, this will be slower in terms of HDD reading (although a SSD does miracles)

Haven't tested the code, but the concept is clear :)

5 Comments

Your code won't work - you're reading strings but comparing to ints.
ah, thought it would read the numbers as ints, but that will be a small edit coming up...
Yup, not working. There are some weird things like: 'line = file.readline()' and in the next line 'counter = line[0]', which brings some errors.
ok, then I will test it :) --> or not, since @bruno's answer was correct
Yeah, it seems he delivered what I was asking, so no need to spend your valuable time on this issue anymore. Thanks! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.