2

I have a question in regards to using pd.read_csv I am currently building a dataframe from multiple csv files from a folder and the csv files are named as follows: "C2__1979H" or "C2_1999Z"

I would like to set the index of my dataFrame to equal the name of the CSV file it is currently pulling to create my dataframe. I have yet to find a way to do that. Here is my current code

my dataframe looks like this:

    Date     Open    High     Low   Close     Vol  OI  Roll
0   19780106  236.00  237.50  234.50  235.50    0   0     0
1   19780113  235.50  239.00  235.00  238.25    0   0     0
2   19780120  238.00  239.00  234.50  237.00    0   0     0
3   19780127  237.00  238.50  235.50  236.00    0   0     0

I want it to look like this

            Date       Open    High     Low   Close    Vol  OI  Roll
C2__1979N   19780106  236.00  237.50  234.50  235.50    0   0     0
C2__1979N   19780113  235.50  239.00  235.00  238.25    0   0     0
C2__1979N   19780120  238.00  239.00  234.50  237.00    0   0     0
C2__1979Z   19780127  237.00  238.50  235.50  236.00    0   0     0 ##(assuming this is where the next csv file began)
4
  • PLEASE NOTE, I know my index_col = None but I wouldnt know what to put that to anyway, ty Commented Sep 10, 2015 at 20:37
  • Just answered to your question, tell me if it fulfills your needs. Commented Sep 10, 2015 at 21:01
  • Is there a reason you need to do this? You're abusing the point of the index here, you can either build a dict of dfs, where the key is the csv name or add a field called 'csv_name', by doing what you desire you completely ruin the usefulness of the index Commented Sep 11, 2015 at 8:11
  • I am always open to other solutions. Feel free to post an answer below although Romain has done it, I am still open to other ways of doing things. That's how you learn ! TY Commented Sep 11, 2015 at 14:58

2 Answers 2

2

It does the trick.

import os

df_temp = pd.DataFrame({'Close': [235.5, 238.25, 237.0, 236.0],
 'Date': [19780106, 19780113, 19780120, 19780127],
 'High': [237.5, 239.0, 239.0, 238.5],
 'Low': [234.5, 235.0, 234.5, 235.5],
 'OI': [0, 0, 0, 0],
 'Open': [236.0, 235.5, 238.0, 237.0],
 'Roll': [0, 0, 0, 0],
 'Vol': [0, 0, 0, 0]})

df = pd.DataFrame()

# To simulate several df
x=0
for file_ in ['the_path/C2__1979N.csv', 'other_path/C2__1979H.csv']:
    filename, file_extension = os.path.splitext(file_)
    df_temp['name'] = os.path.basename(filename)
    df = df.append(df_temp.loc[x:x+1,:])
    x+=1

df.set_index('name', inplace=True)
df.index.name = None
print(df)

# Result
            Close      Date   High    Low  OI   Open  Roll  Vol
C2__1979N  235.50  19780106  237.5  234.5   0  236.0     0    0
C2__1979N  238.25  19780113  239.0  235.0   0  235.5     0    0
C2__1979H  237.00  19780120  239.0  234.5   0  238.0     0    0
C2__1979H  236.00  19780127  238.5  235.5   0  237.0     0    0

In the original code:

for file_ in allFiles:
    names = ['Date', 'Open', 'High', 'Low', 'Close', 'Vol', 'OI', 'Roll']
    df_temp = pd.read_csv(file_, index_col = None, names = names)
    df_temp['Roll'] = 0
    df_temp.iloc[-2,-1] = 1
    filename, file_extension = os.path.splitext(file_)
    df_temp['name'] = os.path.basename(filename)
    df = df.append(df_temp)

df = df.reset_index(drop=True)
df.set_index('name', inplace=True)
df.index.name = None
df = df[names]

df = df.drop_duplicates('Date') ## remove duplicate rows with same date
Sign up to request clarification or add additional context in comments.

3 Comments

what if you dont know the file name? As in, if I ran this code on different folders that within have different # of csv's, different names for csv... then how would the code use that? right now it looks like I would need to type in the name of the file... am I correct?
When you loop through your files (allFiles) you have the filename (file_) right ? In this case you have simply to use it. I have typed file names manually just to simulate.
I have modified my answer to extract only the file name from the path.
0

Have you tried the obvious one?

df_temp.index = [file_]*len(df_temp)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.