Rows are lost when reading this tab-separated file with pandas read_csv

Question

I have a .text file with following format, where fields (index number, name and message) are separated by \t (tab-separated):

712 ben     Battle of the Books
713 james   i used to be in TOM
714 tomy    i was in BOB once
715 ben Tournaments of Minds
716 tommy    Also the Lion in the upcoming school play
717 tommy   Can you guess
718 tommy    P
...

which I read with read_csv into a data frame:

 chat = pd.read_csv("f.text", sep = "\t", header = None, usecols = [2])

But the data frame just has 9812 rows while the ordinary file has more than 12428 rows (just 21 empty lines). It is quite weird. Do you have any idea? Thanks.

Can you post a download link to your data, difficult to answer here without posting guesses which is counter-productive — EdChum
– EdChum, Commented Feb 24, 2016 at 9:34
Very weird. Maybe is necessary parameter lineterminator of read_csv. Or you can try add index_col=None.How you check length of df ? By print len(df) ? — jezrael
– jezrael, Commented Feb 24, 2016 at 9:43
@jezrael just print df It will show the row number under the table. Same result with len(df) — user4462740
– user4462740, Commented Feb 24, 2016 at 10:02
Hmmm, interesting. If you omit usecols, length is still wrong? — jezrael
– jezrael, Commented Feb 24, 2016 at 10:11
Hmmm, try skip rows like chat = pd.read_csv("f.text", skiprows=9810, sep = "\t", header = None, usecols = [2]), then maybe check columns print df.columns and index print df.index — jezrael
– jezrael, Commented Feb 24, 2016 at 11:35

jezrael · Accepted Answer · 2016-02-25 09:18:33Z

20

I think you need add parameter quoting:

import csv

chat = pd.read_csv("f.text",sep = "\t", header = None, usecols = [2], quoting=csv.QUOTE_NONE)

answered Feb 25, 2016 at 9:18

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

smci Over a year ago

jezrael can you actually explain why this works, i.e. why the unquoted read dropped lines? Otherwise it's not a reusable resource to other users.

axme100 Over a year ago

OMG, this saved me! It looks like the default behavior for read_csv() expects everything to be wrapped in quotes. But if it is a tab separated file with no quotes, then you need to specify such, otherwise the data parsing goes awry

Collectives™ on Stack Overflow

Rows are lost when reading this tab-separated file with pandas read_csv

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related