0

I am working with a language where the modules are defined as

<module_name> <inst_name>(.<port_name> (<net_name>)….);

or

module1 inst1 ( .input a,
.output b;
port b=a;);

I want to find all such modules, while ignoring function calls .

I'm having difficulty with regex. I am looking for this

 text1 text2 ( .text3; text4 );

note that all the spaces except the ones between text 1 and text2 are optional and might be new lines instead of spaces.text 3 and text4 can be multi lines but all are in the form of

text3 - >
.blah1 (blah2),
.blah3 (blah4)

text4->
blah1 blah2=xyz;
blah3 blah4=qwe;

I am trying to do

 re.split(r"^[a-zA-Z]*\s[a-zA-Z]*\s?\n?\([a-zA-Z]*\s?\n?;[a-zA-Z]*\);", data)

Doesn't work though.It just grabs everything. How do i fix it? Thanks !! I do need to grab everything individually, eventually (module/instances/port/nets). I think I can split it once regex is working.

2
  • 2
    Python's regex engine can't match nested structures (and you have nested parentheses in your data). You should probably implement a parser for this anyway. Commented Feb 17, 2015 at 20:48
  • Any pointers on how to parse this? THanks ! Commented Feb 17, 2015 at 20:54

1 Answer 1

1

I think you need to write a parser that understands enough of the language to at least canonicalize it before you try extracting information. You could write a simple parser by hand, or you could use a parsing framework such as PLY or others of that ilk.

To give you a more concrete idea about what I'm suggesting, consider the following code, which defines a parse_data function that, given the contents of a file, will yield a series of tokens recognized in that file:

import re

tokens = {
    'lparen': '\(',
    'rparen': '\)',
    'comma': ',',
    'semicolon': ';',
    'whitespace': '\s+',
    'equals': '=',
    'identifier': '[.\d\w]+',
}

tokens = dict((k, re.compile(v)) for k,v in tokens.items())

def parse_data(data):
    while data:
        for tn, tv in tokens.items():
            mo = tv.match(data)
            if mo:
                matched = data[mo.start():mo.end()]
                data = data[mo.end():]
                yield tn, matched

Using this, you could write something that would put your sample input into canonical form:

with open('inputfile') as fd:
    data = fd.read()

last_token = (None, None)
for tn, tv in parse(data):
    if tn == 'whitespace' and last_token[0] != 'semicolon':
        print ' ',
    elif tn == 'whitespace':
        pass
    elif tn == 'semicolon' and last_token[0] == 'rparen':
        print tv
    else:
        print tv,

    last_token = (tn, tv)

Given input like this:

module1 inst1 ( .input a,
.output b;
port b=a;);
module2 inst2 ( .input a, .output b; port b=a;);

module3 inst3 ( .input a, .output b;


port b=a;);

The above code would yield:

module1   inst1   (   .input   a ,   .output   b ; port   b = a ; ) ;
module2   inst2   (   .input   a ,   .output   b ; port   b = a ; ) ;
module3   inst3   (   .input   a ,   .output   b ; port   b = a ; ) ;

Which, because it is in standard form, would be much more amendable to extracting information via simple pattern matching.

Note that while this code relies on reading the entire source file into memory first, you could fairly easily write code that you parse a file in fragments if you were concerned about memory utilization.

Sign up to request clarification or add additional context in comments.

3 Comments

The code seems to get stuck and doesn't ever return, the file is about ~1000 lines.
Well, (a) this was meant as a suggestion of a direction in which you should investigate, not as a complete solution, and (b) it was only tested against the sample input you provided. So it's not too surprising that something isn't working; your input file undoubtedly has content not accounted for in your sample input.
Thanks larsks, the problem comes from having multiple bracketted lines in the function call module inst4( .input a (output b), .input2 (outputc), .input3 (outputd)); > Trying to resolve it now, thanks for your help !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.