1

I have three dataframes of different length. I am combining them into one dataframe for saving it. Now, I want to retrieve individual dataframe data from the combined dataframe using index. A sample of my problem is given below:

df1 = 
   data
0   10
1   20

df2 = 
   data
0   100
1   200
2   300

df3 = 
   data
0   1000
1   2000
2   3000
3   4000

combdf = pd.concat ([df1,df2,df3],ignore_index=True])

combdf = 
    data
0   10
1   20
2   100
3   200
4   300
5   1000
6   2000
7   3000
8   4000

I want to retrieve data of individual data frames from combdf. My code:

data_len = [len(df1),len(df2),len(df3)]
for k in range(0,len(data_len),1):
    if k==0:
        st_id = 0
    else:
        st_id = sum(data_len[:k])
    ed_id = st_id+data_len[k]
    print(combdf.iloc[st_id:ed_id])

Above code is working fine. Is there a better approach than this which does not use for loop?

2 Answers 2

1

Instead of calculating the indices while looping you can generate them first then use those to loop.

data_len = [0, len(df1),len(df2),len(df3)]
data_index = np.cumsum(data_len) #contains [0,2,5,11]
for i in range(len(data_index)-1):
    print(df.iloc[data_index[i]:data_index[i+1]])
Sign up to request clarification or add additional context in comments.

Comments

0

You could create a second index column with pd.MultiIndex that has the name of the original DataFrame. Below you can see a sample of how you could do this:

import pandas as pd 

df_dict = {}

df_dict['df1'] = pd.DataFrame([10, 20])
df_dict['df2'] = pd.DataFrame([100, 200, 300])
df_dict['df3'] = pd.DataFrame([1000, 2000, 3000, 4000])

for df_name, df in df_dict.items():

    # Generate second level of index
    df_index_to_array = df.index.tolist()
    df_index_second_level = [df_name for i in range(0, df.shape[0])]

    df_idx_multi_index = pd.MultiIndex.from_arrays([
        df_index_to_array,
        df_index_second_level
    ])

    df_dict[df_name] = df.set_index(df_idx_multi_index)

df_list = [df for _, df in df_dict.items()]

comb_df = pd.concat(df_list)

This would result in:

          0
0 df1    10
1 df1    20
0 df2   100
1 df2   200
2 df2   300
0 df3  1000
1 df3  2000
2 df3  3000
3 df3  4000

In order to access each item, you could you use .loc from pandas, for example:

>>> comb_df.loc[0, 'df2']
0 100
Name: (0, df2), dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.