2

I want to read multiple csv files into one single dask dataframe. Due to some reasons some portion of my original data get lost (no clue why?!). I am wondering whats the best method to read them all into dask? I used a for loop though not sure if its correct.

 for file in os.listdir(dds_glob):
    if file.endswith('issued_processed.txt'):
        ddf = dd.read_fwf(os.path.join(dds_glob,file),
                          colspecs=cols,
                          header=None,
                          dtype=object,
                          names=names)

or should I use something like this:

dfs = delayed(pd.read_fwf)('/data/input/*issued_processed.txt',
                           colspecs=cols,
                           header=None,
                           dtype=object,
                           names=names)  
ddf = dd.from_delayed(dfs)

1 Answer 1

2

There are at least two approaches:

  1. provide dask.dataframe with a list of files, so using your first snippet it would look like:
file_list = [
    os.path.join(dds_glob,file)
    for file os.listdir(dds_glob) if file.endswith('issued_processed.txt')
]

# other options are skipped for convenience
ddf = dd.read_fwf(file_list)
  1. construct dataframe from delayed objects, which using your second snippet would look like:
# other options are skipped, but can be included after the `file`
dfs = [delayed(pd.read_fwf)(file) for file in file_list] 
ddf = dd.from_delayed(dfs)

The first approach is something that will solve about 82% of the use-cases, but for the other cases you might need to try the second approach or something more involved.

Sign up to request clarification or add additional context in comments.

2 Comments

I like "other use cases" :)
Thanks, both works smoothly, though I have some other issues for the rest of the computation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.