1

I am trying to replace certain words in file which has mutliplelines. Below is the code I wrote. Please note I am still learning python.

ParsedUnFormattedFile = io.open("test.txt", "r", encoding="utf-8", closefd=True).read()

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

for line in ParsedUnFormattedFile:
    for i in remArticles.keys():
           words = line.split()
           ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])

    FormattedFileForIndexing =  io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8", closefd=True)
    FormattedFileForIndexing.write(ParsedReplacementFile)

If I am replacing by directly reading a line, it replaces only 1 word out of all words. It's usually 'the' in my system.

So I wanted to split and look for ever word and then replace it. However I get below error:

line 14, in <module>
    ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])
TypeError: coercing to Unicode: need string or buffer, list found

How can I get this rectified?

Thanks

2 Answers 2

1

There are a number of problems.

  1. ParsedUnFormattedFile is a string, not a file, because you've called .read(). That means your for line in ParsedUnFormattedFile loop does not iterate through the lines in the file, but the individual characters.
  2. Each time the for i in remArticles.keys(): loop runs, a new value is assigned to ParsedReplacementFile. It will only retain the last one.
  3. You're overwriting the file FormattedFileForIndexing.txt in each iteration of your for line in ParsedUnFormattedFile: loop.

It's probably best to redo everything from scratch.

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

with io.open("test.txt", "r", encoding="utf-8") as ParsedUnFormattedFile:
    with io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8") as FormattedFileForIndexing:
        for line in ParsedUnFormattedFile:
            for i in remArticles:
                line= line.replace(i, remArticles[i])
            FormattedFileForIndexing.write(line)
Sign up to request clarification or add additional context in comments.

7 Comments

for the love of god use some pep8 styling
@PadraicCunningham: Never even read pep8. I hate treating code like text. Anyhow, I've pretty much copied OP's code, so don't hate me too much.
Give me properly formatted readable code anyday, some questions I literally cannot look at this being one of those. It actually helps to push the beginners in the right direction, it makes code more readable for people trying to answer it and is likely to get more people willing to answer it so everyone wins.
@Rawing, how does don't be a tool and write python code as it should be written work for you?
@Rawing, relax it was a joke.
|
1

When you call split(), you return a list.

'a b c asd sas'.split()
['a', 'b', 'c', 'asd', 'sas']

Instead, replace before you split, or concat the list back into a string and then replace. To concatenate as list to a string:

words = ''.join(words)

EG:

''.join(['a','b','c'])
>>> 'abc'

2 Comments

:replacing before splitting will not help me, because I have to check for every word and then replace. If I don't split, the entire line is being looked only once and only 1 word out of all the remArticles is getting replaced. For example I have test.txt which has below line " the file that is being edited is a test file and for testing purpose only. this is to remove articles like a , the , and , an , etc,."
Now if I write code as: for line in ParsedUnFormattedFile: for i in remArticles.keys(): ParsedReplacementFile = ParsedUnFormattedFile.replace(line,remArticles[i]), then my output is: file that is being edited is a test file and for testing purpose only. this is to remove articles like a ,, and , an , etc,. my output is: file that is being edited is a test file and for testing purpose only. this is to remove articles like a ,, and , an , etc,. only "the" got replaced

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.