0

I am new to python and I am trying to delete lines in a text file if I find the word "Lett." in the line. Here is a sample of the text file I am trying to parse:

<A>Lamb</A> <W>Let. Moxon</W>
<A>Lamb</A> <W>Danger Confound. Mor. w. Personal Deformity</W>
<A>Lamb</A> <W>Gentle Giantess</W>
<A>Lamb</A> <W>Lett., to Wordsw.</W>
<A>Lamb</A> <W>Lett., to Procter</W>
<A>Lamb</A> <W>Let. to Old Gentleman</W>
<A>Lamb</A> <W>Elia Ser.</W>
<A>Lamb</A> <W>Let. to T. Manning</W>

I know how to open the file but I am just uncertain of how to find the matching text and then how to delete that line. Any help would be greatly appreciated.

5 Answers 5

4
f = open("myfile.txt", "r")
for line in f:
  if not "Lett." in line: print line,

f.close()

or if you want to write the result to a file:

f = open("myfile.txt", "r")
lines = f.readlines()
f.close()
f = open("myfile.txt", "w")
for line in lines:
  if not "Lett." in line: f.write(line)

f.close()
Sign up to request clarification or add additional context in comments.

3 Comments

Don't forget to add a newline when writing each line back into the file.
Nope, readlines will provide newlines on each line already.
You're right. I must have been confusing it with splitlines().
1
# Open input text
text = open('in.txt', 'r')
# Open a file to output results
out = open('out.txt', 'w')

# Go through file line by line
for line in text.readlines():
    if 'Lett.' not in line: ### This is the crucial line.
        # add line to file if 'Lett.' is not in the line
        out.write(line)
# Close the file to save changes
out.close()

Comments

1

I have a general streaming editor framework for this kind of stuff. I load the file into memory, apply changes to the in-memory list of lines, and write out the file if changes were made.

I have boilerplate that looks like this:

from sed_util import delete_range, insert_range, append_range, replace_range

def sed(filename):
    modified = 0

    # Load file into memory
    with open(filename) as f:
        lines = [line.rstrip() for line in f]

    # magic here...

    if modified:
        with open(filename, "w") as f:
            for line in lines:
                f.write(line + "\n")

And in the # magic here section, I have either:

  1. modifications to individual lines, like:

    lines[i] = change_line(lines[i])

  2. calls to my sed utilities for inserting, appending, and replacing lines, like:

    lines = delete_range(lines, some_range)

The latter uses primitives like these:

def delete_range(lines, r):
    """
    >>> a = list(range(10))
    >>> b = delete_range(a, (1, 3))
    >>> b
    [0, 4, 5, 6, 7, 8, 9]
    """
    start, end = r
    assert start <= end
    return [line for i, line in enumerate(lines) if not (start <= i <= end)]

def insert_range(lines, line_no, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = insert_range(a, 3, b)
    >>> c
    [0, 1, 2, 11, 12, 3, 4, 5, 6, 7, 8, 9]
    >>> c = insert_range(a, 0, b)
    >>> c
    [11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = insert_range(a, 9, b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 9]
    """
    assert 0 <= line_no < len(lines)
    return lines[0:line_no] + new_lines + lines[line_no:]

def append_range(lines, line_no, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = append_range(a, 3, b)
    >>> c
    [0, 1, 2, 3, 11, 12, 4, 5, 6, 7, 8, 9]
    >>> c = append_range(a, 0, b)
    >>> c
    [0, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = append_range(a, 9, b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
    """
    assert 0 <= line_no < len(lines)
    return lines[0:line_no+1] + new_lines + lines[line_no+1:]

def replace_range(lines, line_nos, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = replace_range(a, (0, 2), b)
    >>> c
    [11, 12, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = replace_range(a, (8, 10), b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 11, 12]
    >>> c = replace_range(a, (0, 10), b)
    >>> c
    [11, 12]
    >>> c = replace_range(a, (0, 10), [])
    >>> c
    []
    >>> c = replace_range(a, (0, 9), [])
    >>> c
    [9]
    """
    start, end = line_nos
    return lines[:start] + new_lines + lines[end:]

def find_line(lines, regex):
    for i, line in enumerate(lines):
        if regex.match(line):
            return i

if __name__ == '__main__':
    import doctest
    doctest.testmod()

The tests work on arrays of integers, for clarity, but the transformations work for arrays of strings, too.

Generally, I scan the list of lines to identify changes I want to apply, usually with regular expressions, and then I apply the changes on matching data. Today, for example, I ended up making about 2000 line changes across 150 files.

This works better thansed when you need to apply multiline patterns or additional logic to identify whether a change is applicable.

Comments

0

return [l for l in open(fname) if 'Lett' not in l]

Comments

0
result = ''
for line in open('in.txt').readlines():
    if 'lett' not in line:
        result += line
f = open('out.txt', 'a')
f.write(result)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.