0

Possible Duplicate:
Python 3 regular expression to find multiline comment

I need some inputs on how this can be done,really appreciate your inputs,I looked at other posts but none of them matches my requirement.

How to remove line from the file in python Remove lines from textfile with python

I need to match a multi-line comment in a file based on a input string provided.

Example:-

Lets say if the file "test.txt" has the following comment,if inputstring="This is a test, script written" this comment needs to be deleted from the file

import os
import sys

import re
import fnmatch

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                with open(fullname, "r") as f:
                    find_and_remove(f, r"This is a test, script written")

Error:-

Traceback (most recent call last):
  File "comment.py", line 16, in <module>
    find_and_remove(f, r"This is a test, script written")
  File "comment.py", line 8, in find_and_remove
    return re.sub(pattern, "", haystack)
  File "/usr/lib/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
TypeError: expected string or buffer
2
  • 2
    Stop reposting this question. Please. Commented Jan 9, 2013 at 6:59
  • @Tim - I just need ideas to work on..what is wrong in that? Commented Jan 9, 2013 at 7:04

3 Answers 3

3

The first thing that came to mind when I saw the question was "state machine", and whenever I think "state machine" in python, the first thing that comes to mind is "generator" a.k.a. yield:

def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    is_comment = False

    for line in f:
        if line.strip().startswith('/*'):
            is_comment = True

        if line.strip().endswith('*/'): 
            is_comment = False
        elif is_comment:
            pass
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        skipper = skip_comments(f)

        for line in skipper:
            print line,

EDIT: user1927396 upped the ante by specifying that it's just a specific block to exclude, that contains specific text. Since it's inside the comment block, we won't know up front if we need to reject the block or not.

My first thought was buffer. Ack. Poo. My second thought was a haunting refrain I've been carrying in my head for 15 years and never used until now: "stack of state machines" ...

def squelch_comment(f, first_line, exclude_if):
    """
    Comment is a multi-line comment that we may want to suppress
    """
    comment = [first_line]

    if not first_line.strip().endswith('*/'):
        for line in f:

            if exclude_if in line:
                comment = None

            if comment and len(comment):
                comment.append(line)

            if line.strip().endswith('*/'):
                break

    if comment:
        for comment_line in comment:
            yield '...' + comment_line


def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    for line in f:
        if line.strip().startswith('/*'):
            # hand off to the nested, comment-handling, state machine
            for comment_line in squelch_comment(f, line, 'This is a test'):
                yield comment_line
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        for line in skip_comments(f):
            print line,
Sign up to request clarification or add additional context in comments.

8 Comments

Rob - I only want to remove a specific multi-line comment ,not all the multi-line comments
Hi Rob - I understood that part but how do i tell the script to do both..it needs to know if .startswith('/*') and it contains "This is a test, script written"
(deleted my earlier comment ;) ) that's a little harder, because you're getting into look-ahead logic. What I'd do in that case is instead of "pass" for the line, shove all the comment lines into an list. When you get to the end of the comment, iterate through the list. If you see the target line there, pass. If you don't see it, yield each line in the list. ...look-aheads complicate everything ;)
looks complicated..how do we pass all the lines in the list based on a single input string/line(lets say,This is a comment line) ?
check out the edit... but it's not for the faint of heart
|
1

This one does it as in the request: deletes all multiline comments that contain the desired string:

Put this in a file called program.txt

/*
 * This is a test, script written
 * This is a comment line
 * Multi-line comment
 * Last comment
 *
 */

some code

/*
 * This is a comment line
 * And should 
 *     not be removed
 *
 */

more code

Then search and replace. Just make sure the needle does not introduce some regex special characters.

import re

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

# assuming your program is in a file called program.txt
program = open("program.txt", "r").read()

print find_and_remove(program, r"This is a test, script written")

The result:

some code

/*
 * This is a comment line
 * And should 
 * not be removed
 *
 */

more code

It adapts the regex in the related question

Editing the last section in your code:

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                # put all the text into f and read and replace...
                f = open(fullname).read()
                result = find_and_remove(f, r"This is a test, script written")

                new_name = fullname + ".new"
                # After testing, then replace newname with fullname in the 
                # next line in order to replace the original file.
                handle = open(new_name, 'w')
                handle.write(result)
                handle.close()

Make sure that in the needle you escape all regex special characters e.g. (). If your text contains brackets, eg, (any text) they should appear in the needle as \(any text\)

10 Comments

how easy it is to get it to work on files?
i made some changes to your code to loop over a directory and running into a compilation error..I updated my original question with it..can you please see what is wrong?
You need to pass a string into the function I gave you, you were instead passing a file handle. This should clear it.
Script seems to work fine but not seeing the result...exact code is pastie.org/5653294..sample input file am trying is pastie.org/5653293 ..any idea what is wrong here?
Thanks,waiting for your edit
|
1

this should work in principe

def skip(file, lines):
 cline = 0
 result = ""
 for fileLine in file.read():
  if cline not in lines:
   result += fileLine
  cline += 1
 return result

lines must be a list of numbers and file must be an openned file

3 Comments

This might not work for me since my file contains not only numbers.its a regular c code
you obviously don't understand the code
you might be right 50% .can you please explain the code?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.