I got this .log file. I don't know how to read them as DataFrame
id | create_date
-----+----------------------------
318 | 2017-05-05 07:03:27.556697
456 | 2017-07-03 01:50:07.966652
249 | 2017-05-03 13:57:32.567373
I got this .log file. I don't know how to read them as DataFrame
id | create_date
-----+----------------------------
318 | 2017-05-05 07:03:27.556697
456 | 2017-07-03 01:50:07.966652
249 | 2017-05-03 13:57:32.567373
pd.read_table("data.csv", sep="|", skiprows=[1], header=0, parse_dates=[1]).rename(columns=lambda x: x.strip())
id create_date
0 318 2017-05-05 07:03:27.556697
1 456 2017-07-03 01:50:07.966652
2 249 2017-05-03 13:57:32.567373
sep="|"
Use | as column separator
skiprows=[1]
Ignore the second row, which is just decorations and would be the most problematic to parse
header=0
Read column names from the first row
parse_dates=[1]
Convert create_date column into pandas datetime64 format (may be optional)
rename(columns=lambda x: x.strip())
Remove extra whitespaces from column names
You may want to add index_col=0 if you want to make id column your index instead of using a sequential one.
try this,
df=pd.read_csv('file_.csv',sep='|')
then you can remove -----+---------------------------- in many ways
df[df[' id ']!='-----+----------------------------']df[~df[' id '].str.startswith('-')]df.drop(0) # it won't work if your file contains -----+---------------------------- in any other places for example footerdf[df[' create_date '].notnull()] # it won't work when your create_date column contains NaN by default.Output:
id create_date
1 318 2017-05-05 07:03:27.556697
2 456 2017-07-03 01:50:07.966652
3 249 2017-05-03 13:57:32.567373