0

My main objective is to parse python loops such that i can insert few statements for my analysis.

Normal code:
#A.py

[code Starts]
.
.
.
while [condition]:
    [statements]
    [statements]
    [statements]

.
.
.
[code ends]

Instrumented code:

Normal code:
#A.py

[code Starts]
.
.
.
count =0                                    <---------- inserted code 
print "Entry of loop"                       <---------- inserted code
while [condition]:
    print "Iteration Number " + count++     <---------- inserted code
    [statements]
    [statements]
    [statements]
print "Exit of loop"                        <---------- inserted code
.
.
.
[code ends]

My objective is to insert the above codes in the appropriate locations with proper indentation. The loop can also be a for loop. To achieve the above instrumented code i need to parse the Loops in A.py file and insert those code.

Is there a good way to parse these loops and get the line number of the loop so that i can instrument?

Thank you

5
  • 1
    Have you tried to do it using the ast module? This task is very similar to your previous question. Commented Jun 11, 2013 at 9:02
  • @Janne Karila Yes i actually need to exactly like i did in the last question. But since i never used ast i dont know what function to use. It would be kind of you if you could just post a simple scenario? Commented Jun 11, 2013 at 9:07
  • Assuming you parse the file, how will you identify which loop to instrument, or do you instrument them all? Commented Jun 11, 2013 at 10:01
  • @chepner i would instrument all. That is my requirement Commented Jun 11, 2013 at 10:05
  • Do you want only the loops at module level? e.g. in if some condition: for elem in iterable: do stuff do you want to add the code for the inner for? Do you mind nested loops? Do you want to handle one-line loops(e.g. for x in a: print x)? In the simplest case it should be pretty easy to simply read the file line by line and output the extra lines when needed, otherwise you have to do more parsing. Commented Jun 11, 2013 at 10:35

3 Answers 3

1

Parsing is usually a difficult task. You can use the Pygments python library which is a syntax highlighting library. This might seems different from what you intend to do but is not. After all, coloring code is basically adding Color information to code blocks.

Using the PythonLexer you can extract tokens for each line and add any comments you want. This will come handy if you don't want to just work on while loops but also on for loops, if statements ...

Sign up to request clarification or add additional context in comments.

Comments

1

pyparsing has a sample file containing a full (?) Python grammar parser. On the long run this could be an interesting option -- especially if/when your analysis project will gain more features:

1 Comment

Pyparsing is no longer hosted on wikispaces.com. Go to github.com/pyparsing/pyparsing
0

The simplest way of doing this is to simply scan the file line by line and add the statements when you find a line that matches.

The following code does what you want, but it is not robust at all:

def add_info_on_loops(iterable):
    in_loop = False
    for line in iterable:
        if not in_loop:
            if line.startswith('for ') or line.startswith('while '):
                in_loop = True
                yield 'count = 0\n'
                yield 'print "Entry of loop"\n'
                yield line
                yield '    print "Iteration Number:", count'
                yield '    count += 1\n'
            else:
                yield line
        else:
            if not line.startswith('    '):
                in_loop = False
                yield 'print "Exit of loop"\n'
            yield line

Usage:

>>> code = StringIO("""[code Starts]
... .
... .
... .
... while [condition]:
...     [statements]
...     [statements]
...     [statements]
... 
... .
... .
... .
... [code ends]""")
>>> print ''.join(add_info_on_loops(code))
[code Starts]
.
.
.
count = 0
print "Entry of loop"
while [condition]:
    print "Iteration Number:", count    count += 1
    [statements]
    [statements]
    [statements]
print "Exit of loop"

.
.
.
[code ends]

Pitfalls of the code:

  1. The code handles only loops at the top level. Something like if condition: for x in a: ... isn't recognized. This can be solved stripping the lines of whitespace before checking if we got a loop or not(but you then must take into account the different levels of indentation etc.)
  2. The code breaks whenever a loop has a line that isn't indented. This will happen, for example, if you "split" the code with a blank line and the IDE strips the whitespace. A solution might be to wait for a non-blank, non-indented line instead of a non-indented line.
  3. The code doesn't handle tabs for indentation(easily fixed)
  4. The code doesn't handle one-line loops (e.g. for x in a: print x). In this case you'll obtain a wrong output. Easily fixed checking whether there is something after the :.
  5. Using a single count variable is troublesome if you want to add support for nested loops. You should probably have an integer id somewhere and use variable names such as count_0, count_1 with the id that is incremented every time you find a new loop.
  6. The code doesn't handle expressions with parenthesis that do not have whitespace from the keyboard. e.g. for(a,b) in x: isn't detected as a loop, while for (a,b) in x: is detected. This can be easily solved. First you check whether the line starts with for and while and the next character must not be a letter, number, underscore(actually in python3 you can use unicode characters as well, and this becomes harder to test, but possible).
  7. The code doesn't handle source code that ends with an indented loop line. e.g. for x in a: indented_last_line_of_code() the exit print wont be added.(easily fixed adding a check on in_loop outside the for of the function to see whether we have this situation).

As you can see writing a piece of code that does what you asked is not so trivial. I believe the best you can do is to use ast to parse the code then visit the tree and add the nodes at the correct places, then re-visit the code and generate the python source code(usually nodes have indication on the line in the source code, which allows you to copy-paste the exact same code).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.