Dataframe index from string to date

Question

I have a large dataframe (df) where the start looks like:

date,number
2015-12-28,161
2015-12-29,225
2015-12-30,197
2016-06-06,217
2016-06-07,301
2016-06-08,317
2016-06-09,338
2016-06-10,308
2016-10-24,108
2016-10-25,142
2016-10-26,162
2016-10-27,165
2016-10-28,141
2016-01-04,193
2016-01-05,249
2016-01-06,263
2016-01-07,266
2016-01-08,248
2017-01-23,121

This is achieved cycling through a number of directories, opening a specific file and grouping the data in it. Each directory creates part of the final df_final dataframe by the code that is used to generate this is below:

def main():


folder = 'path'
frames = []
df_final = pd.DataFrame()

for dirname, dirs, files in os.walk(folder):
    for filename in files:
        filename_without_extension, extension = os.path.splitext(filename)
        if filename_without_extension == 'portfolio-trade-pos-info':
            
            
            df = pd.read_csv(dirname + '/' +filename, index_col = 'date' )
                                          
            trades = df.groupby('date')[['trade']].count()
            frames.append(trades)

            df_final = df_final.append(df)
            df_final.index_col = 'date'
            df_final.sort_index()

final = pd.concat(frames)
final.sort_values('date')
final.to_csv('trades-per-day.csv', index=True)

I an getting the error:

Traceback (most recent call last):
  File "./trades_per_day.py", line 54, in <module>
    main()
  File "./trades_per_day.py", line 33, in main
    trades = df.groupby('date')[['trade']].count()
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/generic.py", line 3991, in groupby
    **kwargs)
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/groupby.py", line 1511, in groupby
    return klass(obj, by, **kwds)
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/groupby.py", line 370, in __init__
    mutated=self.mutated)
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/groupby.py", line 2462, in _get_grouper
    in_axis, name, gpr = True, gpr, obj[gpr]
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/frame.py", line 2059, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib64/python2.7/site-packages/pandas/core/internals.py", line 3543, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib64/python2.7/site-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'date

Is there a way to change the data type of the dataframe index in df_final to date so I can order the dataframe in date order?

So the above output would be ordered:

date    number
28/12/2015  161
29/12/2015  225
30/12/2015  197
04/01/2016  193
05/01/2016  249
06/01/2016  263
07/01/2016  266
08/01/2016  248
06/06/2016  217
07/06/2016  301
08/06/2016  317
09/06/2016  338
10/06/2016  308
24/10/2016  108
25/10/2016  142
26/10/2016  162
27/10/2016  165
28/10/2016  141
23/01/2017  121

Use df = pd.read_csv(dirname + '/' +filename, parse_dates=['date']) to parse date column on reading in. — Scott Boston
– Scott Boston, Commented Jul 23, 2018 at 21:13

Scott Boston · Accepted Answer · 2018-07-23 21:14:57Z

1

Use parse_dates parameter in pd.read_csv.

MCVE:

from io import StringIO

csvfile = StringIO("""date,number
2015-12-28,161
2015-12-29,225
2015-12-30,197
2016-06-06,217
2016-06-07,301
2016-06-08,317
2016-06-09,338
2016-06-10,308
2016-10-24,108
2016-10-25,142
2016-10-26,162
2016-10-27,165
2016-10-28,141
2016-01-04,193
2016-01-05,249
2016-01-06,263
2016-01-07,266
2016-01-08,248
2017-01-23,121""")

df = pd.read_csv(csvfile, parse_dates=['date'])

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 2 columns):
date      19 non-null datetime64[ns]
number    19 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 384.0 bytes

answered Jul 23, 2018 at 21:14

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Stacey Over a year ago

Thanks, I changed the read_csv line to: df = pd.read_csv(dirname + '/' +filename, parse_dates=['date']). However the resulting final dataframe is still not in date order.

Scott Boston Over a year ago

If you want them in the index you can add, index_col = 'date' into that read_csv and afterwards add .sort_index(). Or you can use .sort_values('date').

Collectives™ on Stack Overflow

Dataframe index from string to date

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related