Removing python pandas data parse error while reading .docx file

Question

In the sample data frame

YYYYMM q1 q2 q3 q4 q5 q6 q7 q8 q9 q0 d1 d2 d3 d4 d5
197501  2 11 12 26 25 10 29 21 30 22  8  7 14  4 13
197502 27 22  8 20  6 26 21  4 19  9 10  1 11 12 23
197503  8  7 21 22 25  9  4 30  2 19 10 11 28 12 27
197504 29 28 27 17 19  2 30 16 18  3  9 10 11  8 13
197505 11 15 12 31 28 24  1 30 13 18  5  6 16  7 20
197506 24 10 27  8 23 28 25 26  9 22  2 12 29 30  1

After reading it

df1=pd.read_csv("Qdays_Ddays.docx",low_memory=False) #error_bad_lines=False)

Getting an error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

Please help to rectify it.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte — Prater
– Prater, Commented Mar 11, 2022 at 5:53
Microsoft Word files are not plain text files. Save your data as a plain text file. — tdy
– tdy, Commented Mar 11, 2022 at 5:57

Gaston Alex · Accepted Answer · 2022-03-11 13:32:10Z

0

You can't read docx with pandas, however you can read it with python-docx:

import docx
import pandas as pd
 
# open connection to Word Document
doc = docx.Document("test.docx")
 
# read in each paragraph in file
result = [p.text for p in doc.paragraphs]
print(result)

#Then you can convert it to Dataframe
df = pd.DataFrame(result)
#You can specify the return orientation.
df.to_dict('series')
#or 
df.to_dict('split')
#or
df.to_dict('records')
#or
df.to_dict('index')

answered Mar 11, 2022 at 13:32

Gaston Alex

1671 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Removing python pandas data parse error while reading .docx file

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related