Problem is resolved.
I'm making a data frame from a several | seprated files. I read in my data, format my date column, and then set my date to datetime index. My desired output is a dataframe that is time stamped so that I can group by Time Grouper. When I run the code to timestamp the index I get an error that is included along with my code and out put without implementing the timestamp:
import numpy as np
import pandas as pd
import glob
df = pd.concat((pd.read_csv(f, sep='|', header=None, low_memory=False, names=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', \
'12', '13', 'date', '15', '16', '17', '18', '19', '20', \
'21', '22'], index_col=None, dtype={'date':str}) for f in \
glob.glob('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas2**.txt')))
df['date'].dropna()
df['date'] = pd.to_datetime(df['date'], format='%m%d%Y')
df1 = df.set_index('date')
print (df1)
cmte_id trans_typ entity_typ state amount
fec_id cand_id
date
2007-08-15 C00112250 24K ORG DC 2000 C00431569 P00003392
2007-09-26 C00119040 24K CCM FL 1000 C00367680 H2FL05127
2007-09-26 C00119040 24K CCM MD 1000 C00140715 H2MD05155
My error:
KeyError: 'date'
18 df2 = df1.set_index(pd.to_datetime(df1['date']), inplace=True)
My raw data:
C00112250|N|Q3|G|27931381854|24K|ORG|HILLARY CLINTON FOR PRESIDENT EXP. COMM.|WASHINGTON|DC|20013|||08152007|2000|C00431569|P00003392|71006.E7975|307490|||4101720071081637544
C00119040|N|Q3|G|27990795873|24K|CCM|FRIENDS OF GINNY BROWN-WAITE|BROOKSVILLE|FL|34605|||09262007|1000|C00367680|H2FL05127|SB21.4307|307491|||4101720071081637552
C00119040|N|Q3|G|27990795873|24K|CCM|HOYER FOR CONGRESS|CLINTON|MD|20735|||09262007|1000|C00140715|H2MD05155|SB21.4303|307491|||4101720071081637553
df['date'] = pd.to_datetime(df['date'], format='%m%d%Y')and thendf1 = df.set_index('date')? and skip all lines after this?df2 = df1.set_index(pd.to_datetime(df1['date']), inplace=True)then yeah, you've set'date'as index so it's no longer a column.df1already is what you seem to be looking for indf2.