creating a new index column for a dataframe according to the input file counter

Question

I have several *.csv files in a path on my disk. These files I merge to one big dataframe. I want to add a new index column to this big dataframe. The value of the index should be a counter according to the data file.

Here is an example of what I am aming for:

data = ([[1,300,25],[1,300,12],[1,300,7],[1,301,18],['...','...','...'],[3,337,2],[3,537,9],[3,537,12],[3,538,19]])
df = pd.DataFrame(data=data, columns = ['index','value1','value2'] )
df

where the index column is the column to be added according to the file counter. Columns 'value1' and 'value2' are part of the *.csv contents.

My approach to creating the index is to count the files in the given subfolder. I pass this count to a loop. Inside this loop a counter, "kal_index", passes the continuous counter of the file and I read in the *.csv files. From the counter as well as from the *.csv files dataframes are created.

The merging fails with a Type Error:

import glob
import pandas as pd

# load the *.csv-files
path = (path_to_file + 'slices/')
filenames = glob.glob(path + "/*.csv")

# count files in path
#counter = len(glob.glob(path +"/*.csv"))
#print(counter)

# zero index counter 
kal_index = 0

# make empty dataframes
kal_id = []
dfs = []

# set content of index dataframe according to file counter and read the *csv files
for i in range(len(glob.glob(path +"/*.csv"))):
    kal_index = int(kal_index) + 1
    for filename in filenames:
        dfs.append(pd.read_csv(filename))
        kal_id.append(kal_index)
    
# make one big dataset and set index column as kal_id        
big_frame = pd.concat([dfs, kal_id], ignore_index=False) 
big_frame.set_index(kal_id)
print(big_dataframe)

The TypeError says: ---> 26 big_frame = pd.concat([dfs, kal_id], ignore_index=False) cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid

Even when explicitely setting kal_id to a dataframe, the error remains the same.

kal_id_df = pd.DataFrame(kal_id)

So I think my mistake is somewhere else. Probably I do not see the forest for the trees at the moment... Any hint for me (or a hint to a similar solved problem I did oversee)?

Swawa · Accepted Answer · 2023-10-20 09:21:35Z

Ok, I see my mistake. I only generate a column in length of the number of files instead of reading the files' length and using this as iterator.

a working solution is this:

# merge all sliced *csv to one dataframe
import glob 

# path to files
path = (path_to_file + 'CAPS_dataframe_slices')
filenames = glob.glob(path + '/*.csv')

# create one dataframe out of several *.csv files (the slices)
# and create an index number list depending on each file 
kal_index = 0    # create an index number counter
kal_id = []
li = []

for f in filenames:
    single_file = pd.read_csv(f)       # increase the counter by 1 with each file to open
    kal_index = kal_index + 1          # get the number of rows for each file
    for j in range(len(single_file)):  # append the index counter for the number of rows
        kal_id.append(kal_index)       # make the file list of all *.csv files in folder
    li.append(single_file)

# make a dataframe out of calibration point index-list
kal_id_df = pd.DataFrame(kal_id)

# append all *.csv to one dataframe
df_all = []
df_all = pd.concat(li, ignore_index=True)

# Using DataFrame.insert() to add the index point column
df_all.insert(0, "cal_point_index", kal_id_df, True)    
df_all

Collectives™ on Stack Overflow

creating a new index column for a dataframe according to the input file counter

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related