2

I am trying to read a docx file and to add the text to a list. Now I need the list to contain lines from the docx file.

example:

docx file:

"Hello, my name is blabla,
I am 30 years old.
I have two kids."

result:

['Hello, my name is blabla', 'I am 30 years old', 'I have two kids']

I cant get it to work.

Using the docx2txt module from here: github link

There is only one command of process and it returns all the text from docx file.

Also I would like it to keep the special characters like ":\-\.\,"

1 Answer 1

8

docx2txt module reads docx file and converts it in text format.

You need to split above output using splitlines() and store it in list.

Code (Comments inline) :

import docx2txt

text = docx2txt.process("a.docx")

#Prints output after converting
print ("After converting text is ",text)

content = []
for line in text.splitlines():
  #This will ignore empty/blank lines. 
  if line != '':
    #Append to list
    content.append(line)

print (content)

Output:

C:\Users\dinesh_pundkar\Desktop>python c.py
After converting text is
 Hello, my name is blabla.

I am 30 years old.

I have two kids.

 List is  ['Hello, my name is blabla.', 'I am 30 years old. ', 'I have two kids.']

C:\Users\dinesh_pundkar\Desktop>
Sign up to request clarification or add additional context in comments.

5 Comments

Wow, i spent so much time trying to figure it out. didnt know there is splitlines() command.. Thanks alot!
@Kiper - Thanks to you also since due to this question I came to know know about docx2txt module.
Im trying to parse pdfs now, after recieving the text of pdf i am using same method of splitline(). for some reason i get whitespace in the end of each line. what would be the best way to get rid of it? my list looks like that ['word1 ', 'word2 ','word3 '] instead of [word1,word2,word3]
Use line=line.strip() to get rid of white space
I swear i tried it. but it worked only after you said it. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.