I want read csv data from hdfs server, but it throws an Exception,like below:
hdfsSeek(desiredPos=64000000): FSDataInputStream#seek error:
java.io.EOFException: Cannot seek after EOF
at
org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1602)
at
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
My Python code:
from dask import dataframe as dd
df = dd.read_csv('hdfs://SER/htmpa/a.csv').head(n=3)
csv file:
user_id,item_id,play_count
0,0,500
0,1,3
0,3,1
1,0,4
1,3,1
2,0,1
2,1,1
2,3,5
3,0,1
3,3,4
4,1,1
4,2,8
4,3,4
driver='pyarrow'to theread_csvcall.