I have a file bigger than 7GB. I am trying to place it into a dataframe using pandas, like this:
df = pd.read_csv('data.csv')
But it takes too long. Is there a better way to speed up the dataframe creation? I was considering changing the parameter engine='c', since it says in the documentation:
"engine{‘c’, ‘python’}, optional
Parser engine to use. The C engine is faster while the python engine is currently more feature-complete."
But I dont see much gain in speed
csvfiles is a fairly slow process. If this is a file you expect to Import/Output frequently then you should pay the upfront cost of reading the csv once, and save it in a format that pandas can read much more quickly: pandas.pydata.org/pandas-docs/stable/user_guide/…. Based on their timings,.pklfiles can be read nearly 50x faster than .csv filesDaskwhich is very similar toPandasbut supports multicore and handle large dataset. docs.dask.org/en/latest/dataframe.html. Kr.