Replace Words in a line from a file in python

Question

I am trying to replace certain words in file which has mutliplelines. Below is the code I wrote. Please note I am still learning python.

ParsedUnFormattedFile = io.open("test.txt", "r", encoding="utf-8", closefd=True).read()

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

for line in ParsedUnFormattedFile:
    for i in remArticles.keys():
           words = line.split()
           ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])

    FormattedFileForIndexing =  io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8", closefd=True)
    FormattedFileForIndexing.write(ParsedReplacementFile)

If I am replacing by directly reading a line, it replaces only 1 word out of all words. It's usually 'the' in my system.

So I wanted to split and look for ever word and then replace it. However I get below error:

line 14, in <module>
    ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])
TypeError: coercing to Unicode: need string or buffer, list found

How can I get this rectified?

Thanks

Aran-Fey · Accepted Answer · 2015-01-21 19:02:20Z

1

There are a number of problems.

ParsedUnFormattedFile is a string, not a file, because you've called .read(). That means your for line in ParsedUnFormattedFile loop does not iterate through the lines in the file, but the individual characters.
Each time the for i in remArticles.keys(): loop runs, a new value is assigned to ParsedReplacementFile. It will only retain the last one.
You're overwriting the file FormattedFileForIndexing.txt in each iteration of your for line in ParsedUnFormattedFile: loop.

It's probably best to redo everything from scratch.

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

with io.open("test.txt", "r", encoding="utf-8") as ParsedUnFormattedFile:
    with io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8") as FormattedFileForIndexing:
        for line in ParsedUnFormattedFile:
            for i in remArticles:
                line= line.replace(i, remArticles[i])
            FormattedFileForIndexing.write(line)

answered Jan 21, 2015 at 19:02

Aran-Fey

44k13 gold badges113 silver badges161 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Padraic Cunningham Over a year ago

for the love of god use some pep8 styling

Aran-Fey Over a year ago

@PadraicCunningham: Never even read pep8. I hate treating code like text. Anyhow, I've pretty much copied OP's code, so don't hate me too much.

Padraic Cunningham Over a year ago

Give me properly formatted readable code anyday, some questions I literally cannot look at this being one of those. It actually helps to push the beginners in the right direction, it makes code more readable for people trying to answer it and is likely to get more people willing to answer it so everyone wins.

Padraic Cunningham Over a year ago

@Rawing, how does don't be a tool and write python code as it should be written work for you?

Padraic Cunningham Over a year ago

@Rawing, relax it was a joke.

|

Adam Hughes · Accepted Answer · 2015-01-21 18:53:39Z

1

When you call split(), you return a list.

'a b c asd sas'.split()
['a', 'b', 'c', 'asd', 'sas']

Instead, replace before you split, or concat the list back into a string and then replace. To concatenate as list to a string:

words = ''.join(words)

EG:

''.join(['a','b','c'])
>>> 'abc'

answered Jan 21, 2015 at 18:53

Adam Hughes

16.5k14 gold badges100 silver badges140 bronze badges

2 Comments

Dinakar Over a year ago

:replacing before splitting will not help me, because I have to check for every word and then replace. If I don't split, the entire line is being looked only once and only 1 word out of all the remArticles is getting replaced. For example I have test.txt which has below line " the file that is being edited is a test file and for testing purpose only. this is to remove articles like a , the , and , an , etc,."

Dinakar Over a year ago

Now if I write code as: for line in ParsedUnFormattedFile: for i in remArticles.keys(): ParsedReplacementFile = ParsedUnFormattedFile.replace(line,remArticles[i]), then my output is: file that is being edited is a test file and for testing purpose only. this is to remove articles like a ,, and , an , etc,. my output is: file that is being edited is a test file and for testing purpose only. this is to remove articles like a ,, and , an , etc,. only "the" got replaced

Collectives™ on Stack Overflow

Replace Words in a line from a file in python

2 Answers 2

7 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related