I want to read multiple csv files into one single dask dataframe. Due to some reasons some portion of my original data get lost (no clue why?!). I am wondering whats the best method to read them all into dask? I used a for loop though not sure if its correct.
for file in os.listdir(dds_glob): if file.endswith('issued_processed.txt'): ddf = dd.read_fwf(os.path.join(dds_glob,file), colspecs=cols, header=None, dtype=object, names=names)
or should I use something like this:
dfs = delayed(pd.read_fwf)('/data/input/*issued_processed.txt', colspecs=cols, header=None, dtype=object, names=names) ddf = dd.from_delayed(dfs)