0

I have several *.csv files in a path on my disk. These files I merge to one big dataframe. I want to add a new index column to this big dataframe. The value of the index should be a counter according to the data file.

Here is an example of what I am aming for:

data = ([[1,300,25],[1,300,12],[1,300,7],[1,301,18],['...','...','...'],[3,337,2],[3,537,9],[3,537,12],[3,538,19]])
df = pd.DataFrame(data=data, columns = ['index','value1','value2'] )
df

enter image description here

where the index column is the column to be added according to the file counter. Columns 'value1' and 'value2' are part of the *.csv contents.

My approach to creating the index is to count the files in the given subfolder. I pass this count to a loop. Inside this loop a counter, "kal_index", passes the continuous counter of the file and I read in the *.csv files. From the counter as well as from the *.csv files dataframes are created.

The merging fails with a Type Error:

import glob
import pandas as pd

# load the *.csv-files
path = (path_to_file + 'slices/')
filenames = glob.glob(path + "/*.csv")

# count files in path
#counter = len(glob.glob(path +"/*.csv"))
#print(counter)

# zero index counter 
kal_index = 0

# make empty dataframes
kal_id = []
dfs = []

# set content of index dataframe according to file counter and read the *csv files
for i in range(len(glob.glob(path +"/*.csv"))):
    kal_index = int(kal_index) + 1
    for filename in filenames:
        dfs.append(pd.read_csv(filename))
        kal_id.append(kal_index)
    
# make one big dataset and set index column as kal_id        
big_frame = pd.concat([dfs, kal_id], ignore_index=False) 
big_frame.set_index(kal_id)
print(big_dataframe)

The TypeError says: ---> 26 big_frame = pd.concat([dfs, kal_id], ignore_index=False) cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid

Even when explicitely setting kal_id to a dataframe, the error remains the same.

kal_id_df = pd.DataFrame(kal_id)

So I think my mistake is somewhere else. Probably I do not see the forest for the trees at the moment... Any hint for me (or a hint to a similar solved problem I did oversee)?

1 Answer 1

0

Ok, I see my mistake. I only generate a column in length of the number of files instead of reading the files' length and using this as iterator.

a working solution is this:

# merge all sliced *csv to one dataframe
import glob 

# path to files
path = (path_to_file + 'CAPS_dataframe_slices')
filenames = glob.glob(path + '/*.csv')

# create one dataframe out of several *.csv files (the slices)
# and create an index number list depending on each file 
kal_index = 0    # create an index number counter
kal_id = []
li = []

for f in filenames:
    single_file = pd.read_csv(f)       # increase the counter by 1 with each file to open
    kal_index = kal_index + 1          # get the number of rows for each file
    for j in range(len(single_file)):  # append the index counter for the number of rows
        kal_id.append(kal_index)       # make the file list of all *.csv files in folder
    li.append(single_file)

# make a dataframe out of calibration point index-list
kal_id_df = pd.DataFrame(kal_id)

# append all *.csv to one dataframe
df_all = []
df_all = pd.concat(li, ignore_index=True)

# Using DataFrame.insert() to add the index point column
df_all.insert(0, "cal_point_index", kal_id_df, True)    
df_all
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.