I have an original dataframe from which I am able to create a modified dataframe, however there will be cases that I am interested in selecting a subset of my data and not using the dataframe as a whole, but I want this all to be done in an entire function for which I am opting to use a subset of the data, however is it possible to return different variables based on a conditional or would this be incorrect.
The function below works fine when I run
modified_df = modify_data(protein_embeddings, protein_df, subset = False)
but when I try executing:
gal_subset_first, gal_subset_second = modify_data(protein_embeddings, protein_df, subset = True)
I get the error:
ValueError: too many values to unpack (expected 2)
The Function
def modify_data(embeddings, df, subset = False):
"""
Modifies Original Dataframe with respective embedddings
:return: Final Dataframe to be used in data split and modelling
"""
#Original_DF
OD_df = df.copy(deep = True)
OD_df = df.reset_index()
OD_df.loc[:,'task'] = 'stability'
#Embeddings Df
embeddings_df = pd.DataFrame(data=embeddings)
embeddings_df = embeddings_df.reset_index()
embedded_df = pd.merge(embeddings_df, OD_df, on='index')
embedded_df = embedded_df.drop(['index', 'sequence', 'temperature'], axis = 1)
def subsetting(embedded_df, sample_no, row_no):
"Select a Subset of rows desired from original dataframe"
#Selecting subset
embedded_df = embedded_df.sample(n = sample_no)
subset_first = gal_subset[:row_no]
subset_second = gal_subset[row_no:]
return subset_first, subset_second
if subset == True:
gal_subset_first, gal_subset_second = subsetting(embedded_df, sample_no = 2000, row_no = 1000)
else:
pass
return embedded_df
gal_subset_first, gal_subset_secondafter callingsubsettingin your if statement. Try replacing that line withreturn subsetting(embedded_df, sample_no = 2000, row_no = 1000).