Removing lines from a file using python [duplicate]

Question

Possible Duplicate:
Python 3 regular expression to find multiline comment

I need some inputs on how this can be done,really appreciate your inputs,I looked at other posts but none of them matches my requirement.

How to remove line from the file in python Remove lines from textfile with python

I need to match a multi-line comment in a file based on a input string provided.

Example:-

Lets say if the file "test.txt" has the following comment,if inputstring="This is a test, script written" this comment needs to be deleted from the file

import os
import sys

import re
import fnmatch

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                with open(fullname, "r") as f:
                    find_and_remove(f, r"This is a test, script written")

Error:-

Traceback (most recent call last):
  File "comment.py", line 16, in <module>
    find_and_remove(f, r"This is a test, script written")
  File "comment.py", line 8, in find_and_remove
    return re.sub(pattern, "", haystack)
  File "/usr/lib/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
TypeError: expected string or buffer

Stop reposting this question. Please.

Tim
– Tim

2013-01-09 06:59:05 +00:00
Commented Jan 9, 2013 at 6:59 — Tim
– Tim, Commented Jan 9, 2013 at 6:59
@Tim - I just need ideas to work on..what is wrong in that?

user1927396
– user1927396

2013-01-09 07:04:41 +00:00
Commented Jan 9, 2013 at 7:04 — user1927396
– user1927396, Commented Jan 9, 2013 at 7:04

sea-rob · Accepted Answer · 2013-01-09 07:49:56Z

3

The first thing that came to mind when I saw the question was "state machine", and whenever I think "state machine" in python, the first thing that comes to mind is "generator" a.k.a. yield:

def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    is_comment = False

    for line in f:
        if line.strip().startswith('/*'):
            is_comment = True

        if line.strip().endswith('*/'): 
            is_comment = False
        elif is_comment:
            pass
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        skipper = skip_comments(f)

        for line in skipper:
            print line,

EDIT: user1927396 upped the ante by specifying that it's just a specific block to exclude, that contains specific text. Since it's inside the comment block, we won't know up front if we need to reject the block or not.

My first thought was buffer. Ack. Poo. My second thought was a haunting refrain I've been carrying in my head for 15 years and never used until now: "stack of state machines" ...

def squelch_comment(f, first_line, exclude_if):
    """
    Comment is a multi-line comment that we may want to suppress
    """
    comment = [first_line]

    if not first_line.strip().endswith('*/'):
        for line in f:

            if exclude_if in line:
                comment = None

            if comment and len(comment):
                comment.append(line)

            if line.strip().endswith('*/'):
                break

    if comment:
        for comment_line in comment:
            yield '...' + comment_line


def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    for line in f:
        if line.strip().startswith('/*'):
            # hand off to the nested, comment-handling, state machine
            for comment_line in squelch_comment(f, line, 'This is a test'):
                yield comment_line
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        for line in skip_comments(f):
            print line,

edited Jan 9, 2013 at 7:49

answered Jan 9, 2013 at 6:59

sea-rob

2,3251 gold badge22 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

user1927396 Over a year ago

Rob - I only want to remove a specific multi-line comment ,not all the multi-line comments

user1927396 Over a year ago

Hi Rob - I understood that part but how do i tell the script to do both..it needs to know if .startswith('/*') and it contains "This is a test, script written"

sea-rob Over a year ago

(deleted my earlier comment ;) ) that's a little harder, because you're getting into look-ahead logic. What I'd do in that case is instead of "pass" for the line, shove all the comment lines into an list. When you get to the end of the comment, iterate through the list. If you see the target line there, pass. If you don't see it, yield each line in the list. ...look-aheads complicate everything ;)

user1927396 Over a year ago

looks complicated..how do we pass all the lines in the list based on a single input string/line(lets say,This is a comment line) ?

sea-rob Over a year ago

check out the edit... but it's not for the faint of heart

|

Community · Accepted Answer · 2017-05-23 11:48:59Z

1

This one does it as in the request: deletes all multiline comments that contain the desired string:

Put this in a file called program.txt

/*
 * This is a test, script written
 * This is a comment line
 * Multi-line comment
 * Last comment
 *
 */

some code

/*
 * This is a comment line
 * And should 
 *     not be removed
 *
 */

more code

Then search and replace. Just make sure the needle does not introduce some regex special characters.

import re

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

# assuming your program is in a file called program.txt
program = open("program.txt", "r").read()

print find_and_remove(program, r"This is a test, script written")

The result:

some code

/*
 * This is a comment line
 * And should 
 * not be removed
 *
 */

more code

It adapts the regex in the related question

Editing the last section in your code:

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                # put all the text into f and read and replace...
                f = open(fullname).read()
                result = find_and_remove(f, r"This is a test, script written")

                new_name = fullname + ".new"
                # After testing, then replace newname with fullname in the 
                # next line in order to replace the original file.
                handle = open(new_name, 'w')
                handle.write(result)
                handle.close()

Make sure that in the needle you escape all regex special characters e.g. (). If your text contains brackets, eg, (any text) they should appear in the needle as \(any text\)

edited May 23, 2017 at 11:48

CommunityBot

11 silver badge

answered Jan 9, 2013 at 7:51

daedalus

10.9k5 gold badges52 silver badges71 bronze badges

10 Comments

user1927396 Over a year ago

how easy it is to get it to work on files?

user1927396 Over a year ago

i made some changes to your code to loop over a directory and running into a compilation error..I updated my original question with it..can you please see what is wrong?

daedalus Over a year ago

You need to pass a string into the function I gave you, you were instead passing a file handle. This should clear it.

user1927396 Over a year ago

Script seems to work fine but not seeing the result...exact code is pastie.org/5653294..sample input file am trying is pastie.org/5653293 ..any idea what is wrong here?

user1927396 Over a year ago

Thanks,waiting for your edit

|

Arnaud Aliès · Accepted Answer · 2013-01-09 07:42:02Z

1

this should work in principe

def skip(file, lines):
 cline = 0
 result = ""
 for fileLine in file.read():
  if cline not in lines:
   result += fileLine
  cline += 1
 return result

lines must be a list of numbers and file must be an openned file

answered Jan 9, 2013 at 7:42

Arnaud Aliès

1,08914 silver badges27 bronze badges

3 Comments

user1927396 Over a year ago

This might not work for me since my file contains not only numbers.its a regular c code

Arnaud Aliès Over a year ago

you obviously don't understand the code

user1927396 Over a year ago

you might be right 50% .can you please explain the code?

Collectives™ on Stack Overflow

Removing lines from a file using python [duplicate]

3 Answers 3

8 Comments

10 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

10 Comments

3 Comments

Linked

Related