Pandas - EmptyDataError: No columns to parse from file when reading stock .csv file

Question

Let me first start by saying I have gone through and done my due diligence trying to find a solution based on questions previously asked on the web.

I've run into an odd bug in my code that I really cannot explain... So far my code executes the following:

take stock symbols and write OHLC data to a CSV file
loop through the directory that contains the CSV files and use that data to calculate technical indicators
add the technical indicator data to the same CSV file

So the bug is that it executes everything perfectly (99 stocks) EXCEPT for ZM.csv (Zoom). The error that it prints is"

pandas.errors.EmptyDataError: No columns to parse from file.

So to troubleshoot I copied and pasted the data from ZM.csv into a CSV that I know ran fine (I used AAPL) and it actually executed fine. Next, I took the working data from AAPL.csv, pasted it into ZM.csv and ran it again. It throws the same error. I also tried renaming the file to ZMI (randomly) and it worked.

This led me to believe that for some unknown reason that the FILENAME is the root issue. The part where I first create the CSV files, I changed the name of the file to be {symbol}1.csv, {symbol}_.csv, and {symbol}I.csv to no avail. Lastly, I combined the two files together and did not mess with anything else. It worked. Does anyone know why?

The flow is to first run bars.py, check the data/ohlc/ directory CSV files (should only have the OHLC data), run technical_analysis.py, and then check the CSV files again (now with technical indicators).

[bar.py]

    from config import *
    from datetime import datetime
    import requests, json

    holdings = open('data/qqq.csv').readlines()

    symbols_list = [holding.split(',')[2].strip() for holding in holdings][1:]
    symbols = ','.join(symbols_list)

    minute_bars_url = '{}/1Min?symbols={}&limit=100'.format(BARS_URL, symbols)
    r = requests.get(minute_bars_url, headers=HEADERS)

    ohlc_data = r.json()

    for symbol in ohlc_data:
        filename = 'data/ohlc/{}.csv'.format(symbol)
        f = open(filename, 'w+')
        f.write('Timestamp,Open,High,Low,Close,Volume\n')
        for bar in ohlc_data[symbol]:
            t = datetime.fromtimestamp(bar['t'])
            timestamp = t.strftime('%I:%M:%S%p-%Z%Y-%m-%d')
            line = '{},{},{},{},{},{}\n'.format(timestamp, bar['o'], bar['h'],                                                 
                                                 bar['l'], bar['c'], bar['v'])
            f.write(line)

The variables symbols_list and symbols print as follows:

symbols_list = ['AAPL', 'MSFT', 'AMZN', 'FB', 'GOOGL', 'GOOG', 'TSLA', 'NVDA', 'PYPL', 'ADBE', 'INTC', 'NFLX', 'CMCSA', 'PEP', 'COST', 'CSCO', 'AVGO', 'QCOM', 'TMUS', 'AMGN', 'TXN', 'CHTR', 'SBUX', 'ZM', 'AMD', 'INTU', 'ISRG', 'MDLZ', 'JD', 'GILD', 'BKNGLD', 'BKNG', 'FISV', 'MELI', 'ATVI', 'ADP', 'CSX', 'REGN', 'MU', 'AMAT', 'ADSK', 'VRTX', 'LRCX', 'ILMN', 'ADI', 'BIIB', 'MNST', 'EXC', 'KDP', 'LULU', 'DOCU', 'WDAY', 'CTSH', 'KHC', 'NXPI', 'BIDU', 'XEL', 'DXCM', 'EBAY', 'EA', 'ID', 'SNPS',XX', 'CTAS', 'SNPS', 'ORLY', 'SGEN', 'SPLK', 'ROST', 'WBA', 'KLAC', 'NTES', 'PCAR', 'CDNS', 'MAR', 'VRSK', 'PAYX', 'ASML', 'ANSS', 'MCHP', 'XLNX', 'MRNA', 'CPRT', 'ALGN', 'PDD', 'ALXN', 'SIRI', 'FAST', 'SWKS', 'VRSN', 'DLTR', 'CE 'TTWO', 'RN', 'MXIM', 'INCY', 'TTWO', 'CDW', 'CHKP', 'CTXS', 'TCOM', 'BMRN', 'ULTA', 'EXPE', 'FOXA', 'LBTYK', 'FOX', 'LBTYA']
symbols = AAPL,MSFT,AMZN,FB,GOOGL,GOOG,TSLA,NVDA,PYPL,ADBE,INTC,NFLX,CMCSA,PEP,COST,CSCO,AVGO,QCOM,TMUS,AMGN,TXN,CHTR,SBUX,ZM,AMD,INTU,ISRG,MDLZ,JD,GILD,BKNG,FISV,MELI,ATVI,ADP,CSX,REGN,MU,AMAT,ADSK,VRTX,LRCX,ILMN,ADI,BIIB,MNST,EXC,KDP,LULU,DOCU,WDAU,DOCU,WDAY,CTSH,KHC,NXPI,BIDU,XEL,DXCM,EBAY,EA,IDXX,CTAS,SNPS,ORLY,SGEN,SPLK,ROST,WBA,KLAC,NTES,PCAR,CDNS,MAR,VRSK,PAYX,ASML,ANSS,MCHP,XLNX,MRNA,CPRT,ALGN,PDD,ALXN,SIRI,FAST,SWKS,VRSN,DLTR,CERN,MXIM,INCY,TTWO,CDW,CHKP,CTXS,TCOM,EXPE,FOXA,BMRN,ULTA,EXPE,FOXA,LBTYK,FOX,LBTYA

So ZM is not listed last.

[technical_analysis.py]

    import btalib
    import pandas as pd
    from datetime import datetime
    from bars import ohlc_data
    from bars import symbols_list as symbols

    for symbol in symbols:
        try:
            file_path = f'data/ohlc/{symbol}.csv'
            dataframe = pd.read_csv(file_path,
                                parse_dates=True,
                                index_col='Timestamp')

            sma6 = btalib.sma(dataframe, period=6)
            sma10 = btalib.sma(dataframe, period=10)
            rsi = btalib.rsi(dataframe)
            macd = btalib.macd(dataframe)

            dataframe['SMA-6'] = sma6.df
            dataframe['SMA-10'] = sma10.df
            dataframe['RSI'] = rsi.df
            dataframe['MACD'] = macd.df['macd']
            dataframe['Signal'] = macd.df['signal']
            dataframe['Histogram'] = macd.df['histogram']

            f = open(file_path, 'w+')
            dataframe.to_csv(file_path, sep=',', index=True)
        except:
            print(f'{symbol} is not writing the technical data.')

is ZM the last item in data/qqq.csv? If you add a bogus symol at the end, does ZM read successfully? — Will
– Will, Commented Oct 23, 2020 at 20:19
Please make sure to tag pandas issues pandas so they get a quick response from people watching that tag. Also, this one is about pd.read_csv(); please remove the lines of code after the pd.read_csv and reduce your code to minimal reproducible example, examples on SO are required to be Minimal. — smci
– smci, Commented Oct 23, 2020 at 20:51
If the pd.read_csv() fails on 'ZM.csv', then just chop your example down to that, and show us its first few lines, perhaps the header or data are malformed. Absolute minimal lines of code to reproduce that. Also, a debugging tip is you can do a Python assert after read_csv that the dataframe or its columns have the expected number of rows/columns; that will cause an immediate exception if they don't. — smci
– smci, Commented Oct 23, 2020 at 20:54
"Next, I renamed the file to ZMI (randomly) and it works... This leads me to believe that for some unknown reason that the FILENAME is the root issue." Not exactly. It does prove that your code is misreading the (alphabetically) last file. The actual filename itself probably doesn't matter (you could shuffle them), only the order. — smci
– smci, Commented Oct 23, 2020 at 21:03
For some reason combining the bars.py and technical_analysis.py files together solves this issue... If anyone knows why I would be very curious what the cause of the issue was. — Tyler Kaihara
– Tyler Kaihara, Commented Oct 23, 2020 at 21:28

smci · Accepted Answer · 2020-10-23 20:57:40Z

1

I think the error might be since 'ZM' is the last symbol in holdings, it contains some whitespace, due to in [bar.py] you created holdings the following way (instead of just the normal pd.read_csv):

holdings = open('data/qqq.csv').readlines()

symbols_list = [holding.split(',')[2].strip() for holding in holdings][1:]
symbols = ','.join(symbols_list)

answered Oct 23, 2020 at 20:57

smci

34.2k21 gold badges118 silver badges152 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Will · Accepted Answer · 2020-10-23 20:45:07Z

0

You can probably reduce the code more to get a minimally viable example. I suspect there is something funny in the qqq.csv file and the split/strip code that makes the last entry not quite what you want.

Hopefully, that'll be clear printing the variable values as below.

with data/qqq.csv like

xname,yname,symbol
xxx,yyy,ZM

and py example

def write_OHLC(fname):
    "write example data to a file"
    f = open(fname, 'w+')
    f.write('Timestamp,Open,High,Low,Close,Volume\n')
    # IRL, would parse json and spitout meaningful values
    f.write('2020-10-13 16:30,1,10,5,100\n')


def all_symbols():
    "get list of all symbols from qqq.csv"
    holdings = open('data/qqq.csv').readlines()
    symbols_list = [holding.split(',')[2].strip() for holding in holdings][1:]
    return symbols_list

# issue saving/reading last(?) symbol
symbols = all_symbols()
print(symbols)
# check just zoom
zm_sym = symbols[-1]
fname = f'data/ohlc/{zm_sym}.csv'
# inspect
print(zm_sym)
print(fname)
# write and read back
write_OHLC(fname)
ZM = pd.read_csv(fname,
                 parse_dates=True,
                 index_col='Timestamp')
print(ZM)

edited Oct 23, 2020 at 20:45

answered Oct 23, 2020 at 20:37

Will

1,3439 silver badges22 bronze badges

7 Comments

Tyler Kaihara Over a year ago

For some reason combining the bars.py and technical_analysis.py files together solves this issue... Also, ZM was printed kind of in the middle of the list of symbols. That qqq.csv is not alphabetically organized. Thank you for your response! Would you happen to know why simply combining the files made a difference?

Will Over a year ago

how did you previously import technical_analysis.py? It doesn't look like there is a function or class. variable scoping might have been weird

Tyler Kaihara Over a year ago

So I imported the symbols_list variable from the bar.py file to use in the loop in technical_analysis.py.

Will Over a year ago

oh wow. i totally missed the for symbol in symbols: line in the second file! The indenting isn't correct there. That must be a formatting issue? otherwise you'd get an obvious "indendtationError"?

Tyler Kaihara Over a year ago

Yeah sorry this is my first post on stack overflow and it took me a while to figure the formatting out. I'll edit. But as far as the error goes, do you have any clue why executing the code in one file would solve the problem?

|

Collectives™ on Stack Overflow

Pandas - EmptyDataError: No columns to parse from file when reading stock .csv file

[bar.py]

[technical_analysis.py]

2 Answers 2

Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

[bar.py]

[technical_analysis.py]

2 Answers 2

Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related