Replacing a specific string in a file using regex PYTHON

Question

I'm tagging a file using Stanford NER and I want to replace every "O" tag with "NONE". I've already tried this code but it shows wrong output. The problem is it replaces every "O" in the string. I'm not familiar with regex and don't know what is the right regex for my problem. TIA.

Here's my code:

    import re
    tagged_text = st.tag(per_word(input_file))
    string_type = "\n".join(" ".join(line) for line in tagged_text)

    for line in string_type:
        output_file.write (re.sub('O$', 'NONE', line))

Sample Input:

Tropical O
    Storm O
    Jolina O
    affects O
    2,000 O
    people O
    MANILA LOCATION
    , O
    Philippines LOCATION
    – O
    Initial O
    reports O
    from O
    the O

OUTPUT:

Tropical NONE
Storm NONE
Jolina NONE
affects NONE
2,000 NONE
people NONE
MANILA LNONECATINONEN
, NONE
Philippines LNONECATINONEN
– NONE
Initial NONE
reports NONE
from NONE
the NONE

What is string_type? It seems you are looping through a string, which will check character by character. — akuiper
– akuiper, Commented Oct 14, 2017 at 3:11
@Psidom I converted the tagged_text(tuples) into a string(string_type) then read line by line. — Jack-Jack
– Jack-Jack, Commented Oct 14, 2017 at 3:18
At what instance it is failing . for e.g., i tried like line = 'TrOpical O' re.sub('O$','NONE',line) 'TrOpical NONE' — chakradhar kasturi
– chakradhar kasturi, Commented Oct 14, 2017 at 3:20

akuiper · Accepted Answer · 2017-10-14 03:21:38Z

1

You don't need to loop through string_type, use re.sub directly on the string should work:

s = """Tropical O
    Storm O
    Jolina O
    affects O
    2,000 O
    people O
    MANILA LOCATION
    , O
    Philippines LOCATION
    – O
    Initial O
    reports O
    from O
    the O"""

import re
print(re.sub(r"\bO(?=\n|$)", "NONE", s))

gives:

Tropical NONE
    Storm NONE
    Jolina NONE
    affects NONE
    2,000 NONE
    people NONE
    MANILA LOCATION
    , NONE
    Philippines LOCATION
    – NONE
    Initial NONE
    reports NONE
    from NONE
    the NONE

Here \bO(?=\n|$) matches a single letter O followed by either a new line character \n or the end of line $.

answered Oct 14, 2017 at 3:21

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replacing a specific string in a file using regex PYTHON

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related