Appending the each dataframe from a list of dataframe with another list of dataframes

Question

I have 2 sets of split data frames from a big data frame. Say for example,

    import pandas as pd, numpy as np

   np.random.seed([3,1415])
ind1 = ['A_p','B_p','C_p','D_p','E_p','F_p','N_p','M_p','O_p','Q_p']
col1 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df1  = pd.DataFrame(np.random.randint(10, size=(10, 7)), columns=col1,index=ind1)
ind2 = ['G_l','I_l','J_l','K_l','L_l','M_l','R_l','N_l']
col2 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df2  = pd.DataFrame(np.random.randint(20, size=(8, 7)), columns=col2,index=ind2)

# Split the dataframes into two parts 
pc_1,pc_2   = np.array_split(df1, 2)
lnc_1,lnc_2 = np.array_split(df2, 2)

And now, I need to concatenate each split data frames from df1 (pc1, pc2) with each data frames from df2 (ln_1,lnc_2). Currently, I am doing it following,

# concatenate each split data frame pc1 with lnc1

pc1_lnc_1 =pd.concat([pc_1,lnc_1])
pc1_lnc_2 =pd.concat([pc_1,lnc_2])
pc2_lnc1  =pd.concat([pc_2,lnc_1])
pc2_lnc2  =pd.concat([pc_2,lnc_2])

On every concatenated data frame I need to run a correlation analysis function, for example,

correlation(pc1_lnc_1)

And I wanted to save the results separately, for example,

  pc1_lnc1=   correlation(pc1_lnc_1)
  pc1_lnc2=   correlation(pc1_lnc_2)
     ......

  pc1_lnc1.to_csv(output,sep='\t')

The question is if there is a way I can automate the above concatenation part, rather than coding it in every line using some sort of loop, currently for every concatenated data frame. I am separately running the function correlation. And I have a pretty long list of the split data frame.

blutab · Accepted Answer · 2020-07-27 13:23:41Z

3

You can loop over the split dataframes:

for pc in np.array_split(df1, 2):
    for lnc in np.array_split(df2, 2):
         print(correlation(pd.concat([pc,lnc])))

answered Jul 27, 2020 at 13:23

blutab

1819 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ARJ Over a year ago

Thanks for the answer, I want to save the output of each concatenated data frame separately. Within the for loop it waits for all split data frames, hence it will take forever, to print the output. I have split the data frames into smaller ones due to this reason. I have updated the question

sushanth · Accepted Answer · 2020-07-27 14:45:30Z

1

Here is another thought,

def correlation(data):
    # do some complex operation..
    return data

# {"pc_1" : split_1, "pc_2" : split_2}
pc = {f"pc_{i + 1}": v for i, v in enumerate(np.array_split(df1, 2))}
lc = {f"lc_{i + 1}": v for i, v in enumerate(np.array_split(df2, 2))}

for pc_k, pc_v in pc.items():
    for lc_k, lc_v in lc.items():
        # (pc_1, lc_1), (pc_1, lc_2) ..
        correlation(pd.concat([pc_v, lc_v])). \
            to_csv(f"{pc_k}_{lc_k}.csv", sep="\t", index=False)

# will create csv like pc_1_lc_1.csv, pc_1_lc_2.csv.. in the current working dir

edited Jul 27, 2020 at 14:45

answered Jul 27, 2020 at 14:10

sushanth

8,2923 gold badges20 silver badges31 bronze badges

3 Comments

ARJ Over a year ago

The output is only printing the headings.

ARJ Over a year ago

Thanks, it's printing the output now!

ARJ Over a year ago

I have a question is there a way to run each of the split data frames in parallel, rather one after another? Currently, on the big data frame, it is taking quite a lot of time to print the output.

zglin · Accepted Answer · 2020-07-27 13:47:41Z

0

If you don't have your individual dataframes in an array (and assuming you have a nontrivial number of dataframes), the easiest way (with minimal code modification) would be to throw an eval in with a loop.

Something like

for counter in range(0,n):
    for counter2 in range(0:n);
        exec("pc{}_lnc{}=correlation(pd.concat([pc_{},lnc_{}]))".format(counter,counter2,counter,counter2))

        eval("pc{}_lnc{}.to_csv(filename,sep='\t')".format(counter,counter2)

The standard disclaimer around eval does still apply (don't do it because it's lazy programming practice and unsafe inputs could cause all kinds of problems in your code).

See here for more details about why eval is bad

edit Updating answer for updated question.

edited Jul 27, 2020 at 13:47

answered Jul 27, 2020 at 13:29

zglin

2,9192 gold badges18 silver badges26 bronze badges

4 Comments

ARJ Over a year ago

Thanks for the help. I want to save the output separately. Just as mentioned in the question. I do not see it is possible with this loop

sushanth Over a year ago

@zhqiat If eval is bad why should it be even recommended ?? In above case eval is redundant.

zglin Over a year ago

@Sushanth Most times eval leads to all types of bugs, however it is included in the language for a reason (meaning occasionally it's an answer to the problem as written)

ARJ Over a year ago

@zhqiat, what is range(0,n) in the script?

Collectives™ on Stack Overflow

Appending the each dataframe from a list of dataframe with another list of dataframes

3 Answers 3

1 Comment

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related