Dask dataframe.read_csv not work correctly with hdfs csv file

Question

I want read csv data from hdfs server, but it throws an Exception,like below:

    hdfsSeek(desiredPos=64000000): FSDataInputStream#seek error:
    java.io.EOFException: Cannot seek after EOF
    at 
    org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1602)
    at 
    org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)

My Python code:

from dask import dataframe as dd
df = dd.read_csv('hdfs://SER/htmpa/a.csv').head(n=3)

csv file:

    user_id,item_id,play_count
    0,0,500
    0,1,3
    0,3,1
    1,0,4
    1,3,1
    2,0,1
    2,1,1
    2,3,5
    3,0,1
    3,3,4
    4,1,1
    4,2,8
    4,3,4

What hdfs driver library are you using? I recommend using pyarrow instead of hdfs3 - you can do this by specifying driver='pyarrow' to the read_csv call. — jiminy_crist
– jiminy_crist, Commented Jun 25, 2019 at 15:37
"specifying driver='pyarrow' " not work. still throw seek error — hendyzone
– hendyzone, Commented Jun 26, 2019 at 8:07

skibee · Accepted Answer · 2019-07-14 05:09:39Z

1

Are you running within and IDE or a jupyter notebook?
We are running on a Cloudera distribution and also get a similar error. From what we understand it is not connected to dask but rather to our hadoop configuration.
In any case we successfully use the pyarrow library when accessing hdfs. be aware that if you need to access parquet files run with version 0.12 and not 0.13 see discussion on github
Update
pyarrow version 0.14 is out and should solve the problem.

edited Jul 14, 2019 at 5:09

answered Jun 24, 2019 at 16:51

skibee

1,3501 gold badge21 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hendyzone Over a year ago

ths.pyarrow is work. But a lot code work by dask,:(.

Collectives™ on Stack Overflow

Dask dataframe.read_csv not work correctly with hdfs csv file

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related