3

I have a dask dataframe created from parquet file on HDFS. When creating setting index using api: set_index, it fails with below error.

File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/shuffle.py", line 64, in set_index divisions, sizes, mins, maxes = base.compute(divisions, sizes, mins, maxes) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/base.py", line 206, in compute results = get(dsk, keys, **kwargs) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1949, in get results = self.gather(packed, asynchronous=asynchronous) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1391, in gather asynchronous=asynchronous) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 561, in sync return sync(self.loop, func, *args, **kwargs) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py", line 241, in sync six.reraise(*error[0]) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py", line 229, in f result[0] = yield make_coro() File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "", line 4, in raise_exc_info File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run yielded = self.gen.throw(*exc_info) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1269, in _gather traceback) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py", line 692, in reraise raise value.with_traceback(tb) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py", line 144, in _read_parquet_row_group open=open, assign=views, scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'

Can some one point me to the reason of this error and how to fix it.

1
  • The issue resolved by downgrading dask to 0.15.3 from 0.15.4 and distributed to 1.19.1 from 1.19.2. Commented Oct 8, 2017 at 1:38

1 Answer 1

1

Solution

Upgrade fastparquet to version 0.1.3.

Details

Dask 0.15.4, used for your example, includes this commit, which adds the argument scheme to read_row_group_file(). This throws an error for fastparquet versions before 0.1.3.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.