0

I created a script to extract dates and prices from a CSV file and print it in another CSV file, but the issue is the dates in the raw CSV file are arranged randomly. How do I extract the date to return in ascending order in my csv file instead of randomly? I'd like to do this in Python, you download the entire CSV file here: http://www.eia.gov/dnav/ng/hist/rngwhhdm.htm.

Sample CSV data below. The dates continue till 1997 and they are sorted in reverse already.

Henry Hub Natural Gas Spot Price
https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm
14:35:57 GMT-0800 (Pacific Standard Time)
Data Source: Thomson Reuters
Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
Jan 2021,2.71
Dec 2020,2.59
Nov 2020,2.61
Oct 2020,2.39
Sep 2020,1.92
Aug 2020,2.3
Jul 2020,1.77
Jun 2020,1.63
May 2020,1.75
Apr 2020,1.74
Mar 2020,1.79
Feb 2020,1.91
Jan 2020,2.02
Dec 2019,2.22
Nov 2019,2.65
Oct 2019,2.33
Sep 2019,2.56
Aug 2019,2.22
Jul 2019,2.37
Jun 2019,2.4
May 2019,2.64
Apr 2019,2.65
Mar 2019,2.95
Feb 2019,2.69
Jan 2019,3.11
Dec 2018,4.04
Nov 2018,4.09
Oct 2018,3.28
Sep 2018,3

The code already written is shown below:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%m"))
df.to_csv('gas-details_month.csv', index=True)

#Other granularities and sections of the .csv file (year)
dfg_year = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('Y').sum()
df = pd.DataFrame(dfg_year, index=dfg_year.index.strftime("%Y"))
df.to_csv('gas-details_year.csv', index=True)

print("Extracted Successfully")
4
  • You should include some sample data in the question itself in case the link dies. It looks like the CSV is not random, but sorted in reverse chronological order to start with. Commented Feb 21, 2021 at 21:17
  • Also, if you want to sort by date, it's better to use yyyy-mm-dd instead of mm/dd/yyyy, since simple string comparison will order the dates chronologically. But since you're already parsing the date into a pandas value, that shouldn't matter; you can use the dataframe.sort_values() method as in @Syed's answer. Commented Feb 21, 2021 at 21:21
  • @MarkReed The raw .csv file has dates arranged in that format of mm/dd/yyyy. Also I have tried the dataframe.sort_values() method but I still get errors Commented Feb 21, 2021 at 21:49
  • ? The raw csv has "Mon yyyy", with no day at all. Commented Feb 22, 2021 at 4:33

2 Answers 2

2

You can indicate to read_csv to parse the dates and then sort by index:

import pandas as pd 
df = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv',
                 index_col='Month',parse_dates=True,infer_datetime_format=True,skiprows=4)
df.sort_index(inplace=True)
print(df)
            Henry Hub Natural Gas Spot Price Dollars per Million Btu
Month                                                               
1997-01-01                                               3.45       
1997-02-01                                               2.15       
1997-03-01                                               1.89       
1997-04-01                                               2.03       
1997-05-01                                               2.25       
...                                                       ...       
2020-09-01                                               1.92       
2020-10-01                                               2.39       
2020-11-01                                               2.61       
2020-12-01                                               2.59       
2021-01-01                                               2.71       

[289 rows x 1 columns]
Sign up to request clarification or add additional context in comments.

2 Comments

But this is not printing to a csv file?
Okay this worked, I had to replace print(df) with df.to_csv('gas-details_day.csv', index=True) . The problem now is if I have to extract details for just months or just days, the syntax does not go through with reference to my original question code
0

You can use the dataframe.sort_index() method as shown below:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
dfg.sort_index()
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%m"))
df.sort_index()
df.to_csv('gas-details_month.csv', index=True)

#Other granularities and sections of the .csv file (year)
dfg_year = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('Y').sum()
df = pd.DataFrame(dfg_year, index=dfg_year.index.strftime("%Y"))
df.sort_index()
df.to_csv('gas-details_year.csv', index=True)

print("Exctracted Succesfully")

7 Comments

Please describe the error or post a screenshot
I just added it as an answer
try using sort_index() method instead as shown in my edited answer
It extracted but the dates are still in descending order as the raw data
try sort_index(ascending=True). By default it should sort in ascending order. I am not sure if this will work but give it a shot
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.