0

Ive been having trouble reading and updating a csv from yfinance, due to the data in the first few rows of the downloaded csv:

  • 1st row contains the column headers I want (also header - 'Price' - I dont want)

  • 2nd row is junk

  • 3rd row has what I want to be the index header

The downloaded csv (formatted) looks like this:

Price Adj Close Close High Low Open Volume
Ticker ^BVSP ^BVSP ^BVSP ^BVSP ^BVSP ^BVSP
Date
2014-01-02 50341.0 50341.0 51656.0 50246.0 51522.0 3476300
2014-01-03 50981.0 50981.0 50981.0 50269.0 50348.0 7360400
2014-01-06 50974.0 50974.0 51002.0 50451.0 50980.0 3727800
2014-01-07 50430.0 50430.0 51478.0 50429.0 50982.0 3339500

The raw .csv file looks like this:

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,^BVSP,^BVSP,^BVSP,^BVSP,^BVSP,^BVSP
Date,,,,,,,
2014-01-02,50341.0,50341.0,51656.0,50246.0,51522.0,3476300
2014-01-03,50981.0,50981.0,50981.0,50269.0,50348.0,7360400
2014-01-06,50974.0,50974.0,51002.0,50451.0,50980.0,3727800
2014-01-07,50430.0,50430.0,51478.0,50429.0,50982.0,3339500

Once read, I want the df to look like this, where 'Date' is datetime index:

Date Adj Close Close High Low Open Volume
2014-01-02 50341.0 50341.0 51656.0 50246.0 51522.0 3476300
2014-01-03 50981.0 50981.0 50981.0 50269.0 50348.0 7360400
2014-01-06 50974.0 50974.0 51002.0 50451.0 50980.0 3727800
2014-01-07 50430.0 50430.0 51478.0 50429.0 50982.0 3339500

I'm using this code, which works, but it seems clumsy.

idx_df = pd.read_csv(
            f'{data_folder}/INDEX_{idx_code}.csv',
            header=None,
            skiprows=3,  #  data starts on row 4
            names=['Date', 'Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume'],
            index_col='Date'
        )
        idx_df.index = pd.to_datetime(idx_df.index, errors='coerce')  

My questions:

  1. Is there a simpler/more elegant way, perhaps using one line of code, and using the "header" parameter, even though "Date" is at position (2, 0) and the others are at (0, 1:6)?
  2. Is there a way to set the index as datetime within the "read_csv" instruction, avoiding the "idx_df.index =" line?

Thanks

2
  • Didn't you ask the same question yesterday? Commented Dec 10, 2024 at 20:12
  • Yes it was removed for some reason. Wrong format I guess. Commented Dec 10, 2024 at 20:57

2 Answers 2

1

Example

import pandas as pd
import io

csv1 = '''Price,Adj Close,Close,High,Low,Open,Volume
Ticker,^BVSP,^BVSP,^BVSP,^BVSP,^BVSP,^BVSP
Date,,,,,,,
2014-01-02,50341.0,50341.0,51656.0,50246.0,51522.0,3476300
2014-01-03,50981.0,50981.0,50981.0,50269.0,50348.0,7360400
2014-01-06,50974.0,50974.0,51002.0,50451.0,50980.0,3727800
2014-01-07,50430.0,50430.0,51478.0,50429.0,50982.0,3339500
'''

Code

use skiprows, parse_dates, index_col parameter

df = pd.read_csv(
    io.StringIO(csv1), # file path
    skiprows=[1, 2], # skip junk rows
    parse_dates=['Price'], # convert Price column to datetime 
    index_col=0 # set Price column as index
).rename_axis('Date') # rename Price -> Date

df

            Adj Close    Close     High      Low     Open   Volume
Date                                                              
2014-01-02    50341.0  50341.0  51656.0  50246.0  51522.0  3476300
2014-01-03    50981.0  50981.0  50981.0  50269.0  50348.0  7360400
2014-01-06    50974.0  50974.0  51002.0  50451.0  50980.0  3727800
2014-01-07    50430.0  50430.0  51478.0  50429.0  50982.0  3339500
Sign up to request clarification or add additional context in comments.

Comments

1

A possible solution, whose steps are:

  • read_csv to load a CSV file named file.csv into a dataframe, skipping the rows at indices 2 and 3 using the skiprows parameter.

  • rename to change the name of the column Price to Date.

pd.read_csv(file.csv, skiprows=[2, 3]).rename({'Price': 'Date'}, axis=1)

Output:

         Date  Adj Close    Close     High      Low     Open   Volume
0  2014-01-02    50341.0  50341.0  51656.0  50246.0  51522.0  3476300
1  2014-01-03    50981.0  50981.0  50981.0  50269.0  50348.0  7360400
2  2014-01-06    50974.0  50974.0  51002.0  50451.0  50980.0  3727800
3  2014-01-07    50430.0  50430.0  51478.0  50429.0  50982.0  3339500

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.