2

I wrote the following code to convert my docx file to text file. The output that I get printed in my text file is the last paragraph/part of the whole file and not the complete content. The code is as follows:

from docx import Document
import io
import shutil

def convertDocxToText(path):
    for d in os.listdir(path):
        fileExtension=d.split(".")[-1]
        if fileExtension =="docx":
            docxFilename = path + d
            print(docxFilename)
            document = Document(docxFilename)


# for printing the complete document
            print('\nThe whole content of the document:->>>\n')
            for para in document.paragraphs:
                textFilename = path + d.split(".")[0] + ".txt"
                with io.open(textFilename,"w", encoding="utf-8") as textFile:
                    #textFile.write(unicode(para.text))
                    x=unicode(para.text)
                    print(x) //the complete content gets printed by this line
                    textFile.write((x)) #after writing the content to text file only last paragraph is copied.
                #textFile.write(para.text)

path= "/home/python/resumes/"
convertDocxToText(path)
3
  • 3
    with io.open(textFilename,"w", encoding="utf-8") as textFile: is inside your for para in document.paragraphs: loop. This means you keep opening the file on each iteration in write-mode, wiping any existing content. You need to open the file once before running your loop i.e. put the for loop inside the with block, not the other way round. Commented Oct 9, 2018 at 10:51
  • thanks i made the changes and it worked.. Commented Oct 9, 2018 at 12:31
  • @sharayusalunkhe is this code currently working for you ?, mine is giving errors even with the corrections,,, Commented Jun 12, 2019 at 2:04

2 Answers 2

3

the following is the solution for the above problem:

from docx import Document
import io
import shutil
import os

def convertDocxToText(path):
    for d in os.listdir(path):
        fileExtension=d.split(".")[-1]
        if fileExtension =="docx":
            docxFilename = path + d
            print(docxFilename)
            document = Document(docxFilename)
            textFilename = path + d.split(".")[0] + ".txt"
            with io.open(textFilename,"w", encoding="utf-8") as textFile:
                for para in document.paragraphs: 
                    textFile.write(unicode(para.text))

path= "/home/python/resumes/"
convertDocxToText(path)
Sign up to request clarification or add additional context in comments.

Comments

1

Problem

as your code says in the last for loop:

        for para in document.paragraphs:
            textFilename = path + d.split(".")[0] + ".txt"
            with io.open(textFilename,"w", encoding="utf-8") as textFile:
                x=unicode(para.text)
                textFile.write((x))

for each paragraph in whole document, you try to open a file named textFilename so let's say you have a file named MyFile.docx in /home/python/resumes/ so the textFilename value that contains the path will be /home/python/resumes/MyFile.txt always in whole of for loop, so the problem is that you open the same file in w mode which is a Write mode, and will overwrite the whole file content.

Solution:

you must open the file once out of that for loop then try add paragraphs one by one to it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.